26 Apr, 2020

3 commits


24 Apr, 2020

5 commits

  • When setting the meter rate to 4+Gbps, there is an
    overflow, the meters don't work as expected.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Before invoking the ovs_meter_cmd_reply_stats, "meter"
    was checked, so don't check it agin in that function.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Don't allow user to create meter unlimitedly, which may cause
    to consume a large amount of kernel memory. The max number
    supported is decided by physical memory and 20K meters as default.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • In kernel datapath of Open vSwitch, there are only 1024
    buckets of meter in one datapath. If installing more than
    1024 (e.g. 8192) meters, it may lead to the performance drop.
    But in some case, for example, Open vSwitch used as edge
    gateway, there should be 20K at least, where meters used for
    IP address bandwidth limitation.

    [Open vSwitch userspace datapath has this issue too.]

    For more scalable meter, this patch use meter array instead of
    hash tables, and expand/shrink the array when necessary. So we
    can install more meters than before in the datapath.
    Introducing the struct *dp_meter_instance, it's easy to
    expand meter though changing the *ti point in the struct
    *dp_meter_table.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

21 Apr, 2020

1 commit

  • syzbot wrote:
    | =============================
    | WARNING: suspicious RCU usage
    | 5.7.0-rc1+ #45 Not tainted
    | -----------------------------
    | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
    |
    | other info that might help us debug this:
    | rcu_scheduler_active = 2, debug_locks = 1
    | ...
    |
    | stack backtrace:
    | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
    | Workqueue: netns cleanup_net
    | Call Trace:
    | ...
    | ovs_ct_exit
    | ovs_exit_net
    | ops_exit_list.isra.7
    | cleanup_net
    | process_one_work
    | worker_thread

    To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
    lockdep_ovsl_is_held as optional lockdep expression.

    Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
    Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
    Cc: Pravin B Shelar
    Cc: Yi-Hung Wei
    Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

03 Apr, 2020

1 commit


30 Mar, 2020

1 commit

  • The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
    offset of the GSO in skb cb. This patch fixes the typo.

    Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
    Signed-off-by: Cambda Zhu
    Signed-off-by: David S. Miller

    Cambda Zhu
     

13 Mar, 2020

1 commit


04 Mar, 2020

1 commit


22 Feb, 2020

1 commit


21 Feb, 2020

1 commit

  • Variables declared in a switch statement before any case statements
    cannot be automatically initialized with compiler instrumentation (as
    they are not part of any execution flow). With GCC's proposed automatic
    stack variable initialization feature, this triggers a warning (and they
    don't get initialized). Clang's automatic stack variable initialization
    (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
    doesn't initialize such variables[1]. Note that these warnings (or silent
    skipping) happen before the dead-store elimination optimization phase,
    so even when the automatic initializations are later elided in favor of
    direct initializations, the warnings remain.

    To avoid these problems, move such variables into the "case" where
    they're used or lift them up into the main function body.

    net/openvswitch/flow_netlink.c: In function ‘validate_set’:
    net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable]
    2711 | const struct ovs_key_ipv4 *ipv4_key;
    | ^~~~~~~~

    [1] https://bugs.llvm.org/show_bug.cgi?id=44916

    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

19 Feb, 2020

4 commits


17 Feb, 2020

1 commit

  • New action to decrement TTL instead of setting it to a fixed value.
    This action will decrement the TTL and, in case of expired TTL, drop it
    or execute an action passed via a nested attribute.
    The default TTL expired action is to drop the packet.

    Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

    Tested with a corresponding change in the userspace:

    # ovs-dpctl dump-flows
    in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
    # ping -c1 192.168.0.2 -t 120
    IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
    # ping -c1 192.168.0.2 -t 1
    #

    Co-developed-by: Bindiya Kurle
    Signed-off-by: Bindiya Kurle
    Signed-off-by: Matteo Croce
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Matteo Croce
     

15 Jan, 2020

1 commit


25 Dec, 2019

1 commit

  • The existing PUSH MPLS action inserts MPLS header between ethernet header
    and the IP header. Though this behaviour is fine for L3 VPN where an IP
    packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
    VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
    encapsulate the ethernet packet.

    The new mpls action ADD_MPLS inserts MPLS header at the start of the
    packet or at the start of the l3 header depending on the value of l3 tunnel
    flag in the ADD_MPLS arguments.

    POP_MPLS action is extended to support ethertype 0x6558.

    Signed-off-by: Martin Varghese
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Martin Varghese
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

05 Dec, 2019

2 commits

  • The skb_mpls_push was not updating ethertype of an ethernet packet if
    the packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to
    port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
    not update the ethertype of the packet even though the previous
    push_eth action had added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4

    Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller

    Martin Varghese
     
  • The openvswitch module shares a common conntrack and NAT infrastructure
    exposed via netfilter. It's possible that a packet needs both SNAT and
    DNAT manipulation, due to e.g. tuple collision. Netfilter can support
    this because it runs through the NAT table twice - once on ingress and
    again after egress. The openvswitch module doesn't have such capability.

    Like netfilter hook infrastructure, we should run through NAT twice to
    keep the symmetry.

    Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller

    Aaron Conole
     

03 Dec, 2019

1 commit

  • The skb_mpls_pop was not updating ethertype of an ethernet packet if the
    packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to port 7
    is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
    the ethertype of the packet even though the previous push_eth action had
    added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x8847),
    mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    pop_mpls(eth_type=0x800),4

    Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Martin Varghese
     

02 Dec, 2019

2 commits

  • If we can't build the flow del notification, we can simply delete
    the flow, no need to crash the kernel. Still keep a WARN_ON to
    preserve debuggability.

    Note: the BUG_ON() predates the Fixes tag, but this change
    can be applied only after the mentioned commit.

    v1 -> v2:
    - do not leak an skb on error

    Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • All the callers of ovs_flow_cmd_build_info() already deal with
    error return code correctly, so we can handle the error condition
    in a more gracefull way. Still dump a warning to preserve
    debuggability.

    v1 -> v2:
    - clarify the commit message
    - clean the skb and report the error (DaveM)

    Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

27 Nov, 2019

1 commit

  • When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
    flow has no UFID, we can exceed the computed size, as
    ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
    attribute.
    Take the above in account when computing the flow command message
    size.

    Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.")
    Reported-by: Qi Jun Ding
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

16 Nov, 2019

1 commit

  • The nla_put_u16/nla_put_u32 makes sure that
    *attrlen is align. The call tree is that:

    nla_put_u16/nla_put_u32
    -> nla_put attrlen = sizeof(u16) or sizeof(u32)
    -> __nla_put attrlen
    -> __nla_reserve attrlen
    -> skb_put(skb, nla_total_size(attrlen))

    nla_total_size returns the total length of attribute
    including padding.

    Cc: Joe Stringer
    Cc: William Tu
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

15 Nov, 2019

1 commit

  • When using the kernel datapath, the upcall don't
    include skb hash info relatived. That will introduce
    some problem, because the hash of skb is important
    in kernel stack. For example, VXLAN module uses
    it to select UDP src port. The tx queue selection
    may also use the hash in stack.

    Hash is computed in different ways. Hash is random
    for a TCP socket, and hash may be computed in hardware,
    or software stack. Recalculation hash is not easy.

    Hash of TCP socket is computed:
    tcp_v4_connect
    -> sk_set_txhash (is random)

    __tcp_transmit_skb
    -> skb_set_hash_from_sk

    There will be one upcall, without information of skb
    hash, to ovs-vswitchd, for the first packet of a TCP
    session. The rest packets will be processed in Open vSwitch
    modules, hash kept. If this tcp session is forward to
    VXLAN module, then the UDP src port of first tcp packet
    is different from rest packets.

    TCP packets may come from the host or dockers, to Open vSwitch.
    To fix it, we store the hash info to upcall, and restore hash
    when packets sent back.

    +---------------+ +-------------------------+
    | Docker/VMs | | ovs-vswitchd |
    +----+----------+ +-+--------------------+--+
    | ^ |
    | | |
    | | upcall v restore packet hash (not recalculate)
    | +-+--------------------+--+
    | tap netdev | | vxlan module
    +---------------> +--> Open vSwitch ko +-->
    or internal type | |
    +-------------------------+

    Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

07 Nov, 2019

1 commit

  • The commit 69c51582ff786 ("dpif-netlink: don't allocate per
    thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
    changed the number of allocated sockets to just one per port
    by moving the socket array from a per handler structure to
    a per datapath one. In the kernel datapath, a vport will have
    only one socket in most case, if so select it directly in
    fast-path.

    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

06 Nov, 2019

1 commit


04 Nov, 2019

7 commits

  • use the specified functions to init resource.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Unlocking of a not locked mutex is not allowed.
    Other kernel thread may be in critical section while
    we unlock it because of setting user_feature fail.

    Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
    Cc: Paul Blakey
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • When we destroy the flow tables which may contain the flow_mask,
    so release the flow mask struct.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The most case *index < ma->max, and flow-mask is not NULL.
    We add un/likely for performance.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Simplify the code and remove the unnecessary BUILD_BUG_ON.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The full looking up on flow table traverses all mask array.
    If mask-array is too large, the number of invalid flow-mask
    increase, performance will be drop.

    One bad case, for example: M means flow-mask is valid and NULL
    of flow-mask means deleted.

    +-------------------------------------------+
    | M | NULL | ... | NULL | M|
    +-------------------------------------------+

    In that case, without this patch, openvswitch will traverses all
    mask array, because there will be one flow-mask in the tail. This
    patch changes the way of flow-mask inserting and deleting, and the
    mask array will be keep as below: there is not a NULL hole. In the
    fast path, we can "break" "for" (not "continue") in flow_lookup
    when we get a NULL flow-mask.

    "break"
    v
    +-------------------------------------------+
    | M | M | NULL |... | NULL | NULL|
    +-------------------------------------------+

    This patch don't optimize slow or control path, still using ma->max
    to traverse. Slow path:
    * tbl_mask_array_realloc
    * ovs_flow_tbl_lookup_exact
    * flow_mask_find

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | In case hash collision on mask cache, OVS does extra flow
    | lookup. Following patch avoid it.

    Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang