04 Nov, 2020

1 commit

  • Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
    by replacing rcu_dereference() with rcu_dereference_ovsl().

    In addition, when creating a new datapath, make sure it's configured under
    the ovs_lock.

    Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
    Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
    Signed-off-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild
    Signed-off-by: Jakub Kicinski

    Eelco Chaudron
     

19 Oct, 2020

1 commit

  • The flow_lookup() function uses per CPU variables, which must be called
    with BH disabled. However, this is fine in the general NAPI use case
    where the local BH is disabled. But, it's also called from the netlink
    context. The below patch makes sure that even in the netlink path, the
    BH is disabled.

    In addition, u64_stats_update_begin() requires a lock to ensure one writer
    which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
    ensures that there is always only one writer.

    Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
    Reported-by: Juri Lelli
    Signed-off-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild
    Signed-off-by: Jakub Kicinski

    Eelco Chaudron
     

02 Sep, 2020

3 commits

  • keep_flows was introduced by [1], which used as flag to delete flows or not.
    When rehashing or expanding the table instance, we will not flush the flows.
    Now don't use it anymore, remove it.

    [1] - https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8
    Cc: Pravin B Shelar
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Decrease table->count and ufid_count unconditionally,
    because we only don't use count or ufid_count to count
    when flushing the flows. To simplify the codes, we
    remove the "count" argument of table_instance_flow_free.

    To avoid a bug when deleting flows in the future, add
    WARN_ON in flush flows function.

    Cc: Pravin B Shelar
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Not change the logic, just improve the coding style.

    Cc: Pravin B Shelar
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

14 Aug, 2020

1 commit

  • To avoid some issues, for example RCU usage warning and double free,
    we should flush the flows under ovs_lock. This patch refactors
    table_instance_destroy and introduces table_instance_flow_flush
    which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

    Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table")
    Reported-by: Johan Knöös
    Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
    Signed-off-by: Tonghao Zhang
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

06 Aug, 2020

1 commit

  • ovs_flow_tbl_destroy always is called from RCU callback
    or error path. It is no need to check if rcu_read_lock
    or lockdep_ovsl_is_held was held.

    ovs_dp_cmd_fill_info always is called with ovs_mutex,
    So use the rcu_dereference_ovsl instead of rcu_dereference
    in ovs_flow_tbl_masks_cache_size.

    Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
    Cc: Eelco Chaudron
    Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
    Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
    Signed-off-by: Tonghao Zhang
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

04 Aug, 2020

2 commits


18 Jul, 2020

1 commit

  • This patch reorders the masks array every 4 seconds based on their
    usage count. This greatly reduces the masks per packet hit, and
    hence the overall performance. Especially in the OVS/OVN case for
    OpenShift.

    Here are some results from the OVS/OVN OpenShift test, which use
    8 pods, each pod having 512 uperf connections, each connection
    sends a 64-byte request and gets a 1024-byte response (TCP).
    All uperf clients are on 1 worker node while all uperf servers are
    on the other worker node.

    Kernel without this patch : 7.71 Gbps
    Kernel with this patch applied: 14.52 Gbps

    We also run some tests to verify the rebalance activity does not
    lower the flow insertion rate, which does not.

    Signed-off-by: Eelco Chaudron
    Tested-by: Andrew Theurer
    Reviewed-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eelco Chaudron
     

03 Apr, 2020

1 commit


19 Feb, 2020

1 commit


04 Nov, 2019

8 commits

  • When we destroy the flow tables which may contain the flow_mask,
    so release the flow mask struct.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The most case *index < ma->max, and flow-mask is not NULL.
    We add un/likely for performance.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Simplify the code and remove the unnecessary BUILD_BUG_ON.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The full looking up on flow table traverses all mask array.
    If mask-array is too large, the number of invalid flow-mask
    increase, performance will be drop.

    One bad case, for example: M means flow-mask is valid and NULL
    of flow-mask means deleted.

    +-------------------------------------------+
    | M | NULL | ... | NULL | M|
    +-------------------------------------------+

    In that case, without this patch, openvswitch will traverses all
    mask array, because there will be one flow-mask in the tail. This
    patch changes the way of flow-mask inserting and deleting, and the
    mask array will be keep as below: there is not a NULL hole. In the
    fast path, we can "break" "for" (not "continue") in flow_lookup
    when we get a NULL flow-mask.

    "break"
    v
    +-------------------------------------------+
    | M | M | NULL |... | NULL | NULL|
    +-------------------------------------------+

    This patch don't optimize slow or control path, still using ma->max
    to traverse. Slow path:
    * tbl_mask_array_realloc
    * ovs_flow_tbl_lookup_exact
    * flow_mask_find

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | In case hash collision on mask cache, OVS does extra flow
    | lookup. Following patch avoid it.

    Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • When creating and inserting flow-mask, if there is no available
    flow-mask, we realloc the mask array. When removing flow-mask,
    if necessary, we shrink mask array.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | mask caches index of mask in mask_list. On packet recv OVS
    | need to traverse mask-list to get cached mask. Therefore array
    | is better for retrieving cached mask. This also allows better
    | cache replacement algorithm by directly checking mask's existence.

    Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The idea of this optimization comes from a patch which
    is committed in 2014, openvswitch community. The author
    is Pravin B Shelar. In order to get high performance, I
    implement it again. Later patches will use it.

    Pravin B Shelar, says:
    | On every packet OVS needs to lookup flow-table with every
    | mask until it finds a match. The packet flow-key is first
    | masked with mask in the list and then the masked key is
    | looked up in flow-table. Therefore number of masks can
    | affect packet processing performance.

    Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

20 Jul, 2019

1 commit

  • There is a flow_stats structure defined in include/net/flow_offload.h
    and a follow up patch adds #include to
    net/sch_generic.h.

    This breaks compilation since OVS codebase includes net/sock.h which
    pulls in linux/filter.h which includes net/sch_generic.h.

    In file included from ./include/net/sch_generic.h:18:0,
    from ./include/linux/filter.h:25,
    from ./include/net/sock.h:59,
    from ./include/linux/tcp.h:19,
    from net/openvswitch/datapath.c:24

    This definition takes precedence on OVS since it is placed in the
    networking core, so rename flow_stats in OVS to sw_flow_stats since
    this structure is contained in sw_flow.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not write to the free
    software foundation inc 51 franklin street fifth floor boston ma
    02110 1301 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 21 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

13 Mar, 2019

1 commit

  • Patch series "generic radix trees; drop flex arrays".

    This patch (of 7):

    There was no real need for this code to be using flexarrays, it's just
    implementing a hash table - ideally it would be using rhashtables, but
    that conversion would be significantly more complicated.

    Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
    Signed-off-by: Kent Overstreet
    Reviewed-by: Matthew Wilcox
    Cc: Pravin B Shelar
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Paris
    Cc: Marcelo Ricardo Leitner
    Cc: Neil Horman
    Cc: Paul Moore
    Cc: Shaohua Li
    Cc: Stephen Smalley
    Cc: Vlad Yasevich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

20 Jul, 2017

1 commit

  • When calling the flow_free() to free the flow, we call many times
    (cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
    take up our CPU usage if we call the flow_free() frequently.
    When we put all packets to userspace via upcall, and OvS will send
    them back via netlink to ovs_packet_cmd_execute(will call flow_free).

    The test topo is shown as below. VM01 sends TCP packets to VM02,
    and OvS forward packtets. When testing, we use perf to report the
    system performance.

    VM01 --- OvS-VM --- VM02

    Without this patch, perf-top show as below: The flow_free() is
    3.02% CPU usage.

    4.23% [kernel] [k] _raw_spin_unlock_irqrestore
    3.62% [kernel] [k] __do_softirq
    3.16% [kernel] [k] __memcpy
    3.02% [kernel] [k] flow_free
    2.42% libc-2.17.so [.] __memcpy_ssse3_back
    2.18% [kernel] [k] copy_user_generic_unrolled
    2.17% [kernel] [k] find_next_bit

    When applied this patch, perf-top show as below: Not shown on
    the list anymore.

    4.11% [kernel] [k] _raw_spin_unlock_irqrestore
    3.79% [kernel] [k] __do_softirq
    3.46% [kernel] [k] __memcpy
    2.73% libc-2.17.so [.] __memcpy_ssse3_back
    2.25% [kernel] [k] copy_user_generic_unrolled
    1.89% libc-2.17.so [.] _int_malloc
    1.53% ovs-vswitchd [.] xlate_actions

    With this patch, the TCP throughput(we dont use Megaflow Cache
    + Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
    (maybe ~10% performance imporve).

    This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
    that the flow used. And we only check the flow_stats on the cpu we
    used, and it is unncessary to check all possible cpu when getting,
    cleaning, and updating the flow_stats. Adding the cpu_used_mask to
    sw_flow struct does’t increase the cacheline number.

    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

19 Sep, 2016

2 commits

  • Instead of using flow stats per NUMA node, use it per CPU. When using
    megaflows, the stats lock can be a bottleneck in scalability.

    On a E5-2690 12-core system, usual throughput went from ~4Mpps to
    ~15Mpps when forwarding between two 40GbE ports with a single flow
    configured on the datapath.

    This has been tested on a system with possible CPUs 0-7,16-23. After
    module removal, there were no corruption on the slab cache.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Cc: pravin shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • On a system with only node 1 as possible, all statistics is going to be
    accounted on node 0 as it will have a single writer.

    However, when getting and clearing the statistics, node 0 is not going
    to be considered, as it's not a possible node.

    Tested that statistics are not zero on a system with only node 1
    possible. Also compile-tested with CONFIG_NUMA off.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     

20 Oct, 2015

1 commit


07 Oct, 2015

1 commit

  • Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
    also acts as an indicator whether the flow contains tunnel data (this was
    previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
    in an union with IPv4 ones this won't work anymore).

    The new field was added to a hole in sw_flow_key.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     

05 Oct, 2015

1 commit

  • When openvswitch tries allocate memory from offline numa node 0:
    stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
    It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
    [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
    This patch disables numa affinity in this case.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

23 Sep, 2015

1 commit

  • When support for megaflows was introduced, OVS needed to start
    installing flows with a mask applied to them. Since masking is an
    expensive operation, OVS also had an optimization that would only
    take the parts of the flow keys that were covered by a non-zero
    mask. The values stored in the remaining pieces should not matter
    because they are masked out.

    While this works fine for the purposes of matching (which must always
    look at the mask), serialization to netlink can be problematic. Since
    the flow and the mask are serialized separately, the uninitialized
    portions of the flow can be encoded with whatever values happen to be
    present.

    In terms of functionality, this has little effect since these fields
    will be masked out by definition. However, it leaks kernel memory to
    userspace, which is a potential security vulnerability. It is also
    possible that other code paths could look at the masked key and get
    uninitialized data, although this does not currently appear to be an
    issue in practice.

    This removes the mask optimization for flows that are being installed.
    This was always intended to be the case as the mask optimizations were
    really targetting per-packet flow operations.

    Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
    Signed-off-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jesse Gross
     

21 Aug, 2015

1 commit


23 Jul, 2015

1 commit


22 Jul, 2015

2 commits

  • Some architectures like POWER can have a NUMA node_possible_map that
    contains sparse entries. This causes memory corruption with openvswitch
    since it allocates flow_cache with a multiple of num_possible_nodes() and
    assumes the node variable returned by for_each_node will index into
    flow->stats[node].

    Use nr_node_ids to allocate a maximal sparse array instead of
    num_possible_nodes().

    The crash was noticed after 3af229f2 was applied as it changed the
    node_possible_map to match node_online_map on boot.
    Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

    Signed-off-by: Chris J Arges
    Acked-by: Pravin B Shelar
    Acked-by: Nishanth Aravamudan
    Signed-off-by: David S. Miller

    Chris J Arges
     
  • Utilize the new metadata dst to attach encapsulation instructions to
    the skb. The existing egress_tun_info via the OVS_CB() is left in
    place until all tunnel vports have been converted to the new method.

    Signed-off-by: Thomas Graf
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     

08 Feb, 2015

1 commit

  • Flow alloc needs to initialize unmasked key pointer. Otherwise
    it can crash kernel trying to free random unmasked-key pointer.

    general protection fault: 0000 [#1] SMP
    3.19.0-rc6-net-next+ #457
    Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008
    RIP: 0010:[] [] kfree+0xac/0x196
    Call Trace:
    [] flow_free+0x21/0x59 [openvswitch]
    [] ovs_flow_free+0x21/0x23 [openvswitch]
    [] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
    [] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
    [] ? nla_parse+0x4f/0xec
    [] genl_family_rcv_msg+0x26d/0x2c9
    [] ? __lock_acquire+0x90e/0x9aa
    [] genl_rcv_msg+0x66/0x89
    [] ? genl_family_rcv_msg+0x2c9/0x2c9
    [] netlink_rcv_skb+0x3e/0x95
    [] ? genl_rcv+0x18/0x37
    [] genl_rcv+0x27/0x37
    [] netlink_unicast+0x103/0x191
    [] netlink_sendmsg+0x2c1/0x310
    [] ? might_fault+0x50/0xa0
    [] do_sock_sendmsg+0x5f/0x7a
    [] sock_sendmsg+0xb/0xd
    [] ___sys_sendmsg+0x1a3/0x218
    [] ? get_close_on_exec+0x86/0x86
    [] ? fsnotify+0x32c/0x348
    [] ? fsnotify+0x7c/0x348
    [] ? __fget+0xaa/0xbf
    [] ? get_close_on_exec+0x86/0x86
    [] __sys_sendmsg+0x3d/0x5e
    [] SyS_sendmsg+0x14/0x16
    [] system_call_fastpath+0x12/0x17

    Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
    CC: Joe Stringer
    Reported-by: Or Gerlitz
    Signed-off-by: Pravin B Shelar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

27 Jan, 2015

3 commits

  • Previously, flows were manipulated by userspace specifying a full,
    unmasked flow key. This adds significant burden onto flow
    serialization/deserialization, particularly when dumping flows.

    This patch adds an alternative way to refer to flows using a
    variable-length "unique flow identifier" (UFID). At flow setup time,
    userspace may specify a UFID for a flow, which is stored with the flow
    and inserted into a separate table for lookup, in addition to the
    standard flow table. Flows created using a UFID must be fetched or
    deleted using the UFID.

    All flow dump operations may now be made more terse with OVS_UFID_F_*
    flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
    omit the flow key from a datapath operation if the flow has a
    corresponding UFID. This significantly reduces the time spent assembling
    and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
    enabled, the datapath only returns the UFID and statistics for each flow
    during flow dump, increasing ovs-vswitchd revalidator performance by 40%
    or more.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • These minor tidyups make a future patch a little tidier.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
    This tidies up a future patch.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

11 Dec, 2014

1 commit

  • This patch effectively reverts commit 500f80872645 ("net: ovs: use CRC32
    accelerated flow hash if available"), and other remaining arch_fast_hash()
    users such as from nfsd via commit 6282cd565553 ("NFSD: Don't hand out
    delegations for 30 seconds after recalling them.") where it has been used
    as a hash function for bloom filtering.

    While we think that these users are actually not much of concern, it has
    been requested to remove the arch_fast_hash() library bits that arose
    from [1] entirely as per recent discussion [2]. The main argument is that
    using it as a hash may introduce bias due to its linearity (see avalanche
    criterion) and thus makes it less clear (though we tried to document that)
    when this security/performance trade-off is actually acceptable for a
    general purpose library function.

    Lets therefore avoid any further confusion on this matter and remove it to
    prevent any future accidental misuse of it. For the time being, this is
    going to make hashing of flow keys a bit more expensive in the ovs case,
    but future work could reevaluate a different hashing discipline.

    [1] https://patchwork.ozlabs.org/patch/299369/
    [2] https://patchwork.ozlabs.org/patch/418756/

    Cc: Neil Brown
    Cc: Francesco Fusco
    Cc: Jesse Gross
    Cc: Thomas Graf
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Nov, 2014

1 commit