04 Nov, 2020
1 commit
-
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
by replacing rcu_dereference() with rcu_dereference_ovsl().In addition, when creating a new datapath, make sure it's configured under
the ovs_lock.Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
Signed-off-by: Eelco Chaudron
Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild
Signed-off-by: Jakub Kicinski
19 Oct, 2020
1 commit
-
The flow_lookup() function uses per CPU variables, which must be called
with BH disabled. However, this is fine in the general NAPI use case
where the local BH is disabled. But, it's also called from the netlink
context. The below patch makes sure that even in the netlink path, the
BH is disabled.In addition, u64_stats_update_begin() requires a lock to ensure one writer
which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
ensures that there is always only one writer.Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
Reported-by: Juri Lelli
Signed-off-by: Eelco Chaudron
Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild
Signed-off-by: Jakub Kicinski
02 Sep, 2020
3 commits
-
keep_flows was introduced by [1], which used as flag to delete flows or not.
When rehashing or expanding the table instance, we will not flush the flows.
Now don't use it anymore, remove it.[1] - https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8
Cc: Pravin B Shelar
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Decrease table->count and ufid_count unconditionally,
because we only don't use count or ufid_count to count
when flushing the flows. To simplify the codes, we
remove the "count" argument of table_instance_flow_free.To avoid a bug when deleting flows in the future, add
WARN_ON in flush flows function.Cc: Pravin B Shelar
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Not change the logic, just improve the coding style.
Cc: Pravin B Shelar
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
14 Aug, 2020
1 commit
-
To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: Johan Knöös
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
Signed-off-by: Tonghao Zhang
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller
06 Aug, 2020
1 commit
-
ovs_flow_tbl_destroy always is called from RCU callback
or error path. It is no need to check if rcu_read_lock
or lockdep_ovsl_is_held was held.ovs_dp_cmd_fill_info always is called with ovs_mutex,
So use the rcu_dereference_ovsl instead of rcu_dereference
in ovs_flow_tbl_masks_cache_size.Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
Cc: Eelco Chaudron
Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang
Acked-by: Paolo Abeni
Signed-off-by: David S. Miller
04 Aug, 2020
2 commits
-
This patch makes the masks cache size configurable, or with
a size of 0, disable it.Reviewed-by: Paolo Abeni
Reviewed-by: Tonghao Zhang
Signed-off-by: Eelco Chaudron
Signed-off-by: David S. Miller -
Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.Reviewed-by: Paolo Abeni
Reviewed-by: Tonghao Zhang
Signed-off-by: Eelco Chaudron
Signed-off-by: David S. Miller
18 Jul, 2020
1 commit
-
This patch reorders the masks array every 4 seconds based on their
usage count. This greatly reduces the masks per packet hit, and
hence the overall performance. Especially in the OVS/OVN case for
OpenShift.Here are some results from the OVS/OVN OpenShift test, which use
8 pods, each pod having 512 uperf connections, each connection
sends a 64-byte request and gets a 1024-byte response (TCP).
All uperf clients are on 1 worker node while all uperf servers are
on the other worker node.Kernel without this patch : 7.71 Gbps
Kernel with this patch applied: 14.52 GbpsWe also run some tests to verify the rebalance activity does not
lower the flow insertion rate, which does not.Signed-off-by: Eelco Chaudron
Tested-by: Andrew Theurer
Reviewed-by: Paolo Abeni
Signed-off-by: David S. Miller
03 Apr, 2020
1 commit
-
The struct sw_flow is protected by RCU, when traversing them,
use hlist_for_each_entry_rcu.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Reviewed-by: Greg Rose
Signed-off-by: David S. Miller
19 Feb, 2020
1 commit
-
hlist_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.Signed-off-by: Madhuparna Bhowmik
Signed-off-by: David S. Miller
04 Nov, 2019
8 commits
-
When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
The most case *index < ma->max, and flow-mask is not NULL.
We add un/likely for performance.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Simplify the code and remove the unnecessary BUILD_BUG_ON.
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.One bad case, for example: M means flow-mask is valid and NULL
of flow-mask means deleted.+-------------------------------------------+
| M | NULL | ... | NULL | M|
+-------------------------------------------+In that case, without this patch, openvswitch will traverses all
mask array, because there will be one flow-mask in the tail. This
patch changes the way of flow-mask inserting and deleting, and the
mask array will be keep as below: there is not a NULL hole. In the
fast path, we can "break" "for" (not "continue") in flow_lookup
when we get a NULL flow-mask."break"
v
+-------------------------------------------+
| M | M | NULL |... | NULL | NULL|
+-------------------------------------------+This patch don't optimize slow or control path, still using ma->max
to traverse. Slow path:
* tbl_mask_array_realloc
* ovs_flow_tbl_lookup_exact
* flow_mask_findSigned-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller -
When creating and inserting flow-mask, if there is no available
flow-mask, we realloc the mask array. When removing flow-mask,
if necessary, we shrink mask array.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| mask caches index of mask in mask_list. On packet recv OVS
| need to traverse mask-list to get cached mask. Therefore array
| is better for retrieving cached mask. This also allows better
| cache replacement algorithm by directly checking mask's existence.Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller -
The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller
20 Jul, 2019
1 commit
-
There is a flow_stats structure defined in include/net/flow_offload.h
and a follow up patch adds #include to
net/sch_generic.h.This breaks compilation since OVS codebase includes net/sock.h which
pulls in linux/filter.h which includes net/sch_generic.h.In file included from ./include/net/sch_generic.h:18:0,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:59,
from ./include/linux/tcp.h:19,
from net/openvswitch/datapath.c:24This definition takes precedence on OVS since it is placed in the
networking core, so rename flow_stats in OVS to sw_flow_stats since
this structure is contained in sw_flow.Signed-off-by: Pablo Neira Ayuso
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
05 Jun, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usaextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 21 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Reviewed-by: Richard Fontana
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.de
Signed-off-by: Greg Kroah-Hartman
13 Mar, 2019
1 commit
-
Patch series "generic radix trees; drop flex arrays".
This patch (of 7):
There was no real need for this code to be using flexarrays, it's just
implementing a hash table - ideally it would be using rhashtables, but
that conversion would be significantly more complicated.Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet
Reviewed-by: Matthew Wilcox
Cc: Pravin B Shelar
Cc: Alexey Dobriyan
Cc: Al Viro
Cc: Dave Hansen
Cc: Eric Paris
Cc: Marcelo Ricardo Leitner
Cc: Neil Horman
Cc: Paul Moore
Cc: Shaohua Li
Cc: Stephen Smalley
Cc: Vlad Yasevich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2017
1 commit
-
When calling the flow_free() to free the flow, we call many times
(cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
take up our CPU usage if we call the flow_free() frequently.
When we put all packets to userspace via upcall, and OvS will send
them back via netlink to ovs_packet_cmd_execute(will call flow_free).The test topo is shown as below. VM01 sends TCP packets to VM02,
and OvS forward packtets. When testing, we use perf to report the
system performance.VM01 --- OvS-VM --- VM02
Without this patch, perf-top show as below: The flow_free() is
3.02% CPU usage.4.23% [kernel] [k] _raw_spin_unlock_irqrestore
3.62% [kernel] [k] __do_softirq
3.16% [kernel] [k] __memcpy
3.02% [kernel] [k] flow_free
2.42% libc-2.17.so [.] __memcpy_ssse3_back
2.18% [kernel] [k] copy_user_generic_unrolled
2.17% [kernel] [k] find_next_bitWhen applied this patch, perf-top show as below: Not shown on
the list anymore.4.11% [kernel] [k] _raw_spin_unlock_irqrestore
3.79% [kernel] [k] __do_softirq
3.46% [kernel] [k] __memcpy
2.73% libc-2.17.so [.] __memcpy_ssse3_back
2.25% [kernel] [k] copy_user_generic_unrolled
1.89% libc-2.17.so [.] _int_malloc
1.53% ovs-vswitchd [.] xlate_actionsWith this patch, the TCP throughput(we dont use Megaflow Cache
+ Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
(maybe ~10% performance imporve).This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
that the flow used. And we only check the flow_stats on the cpu we
used, and it is unncessary to check all possible cpu when getting,
cleaning, and updating the flow_stats. Adding the cpu_used_mask to
sw_flow struct does’t increase the cacheline number.Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
19 Sep, 2016
2 commits
-
Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.Signed-off-by: Thadeu Lima de Souza Cascardo
Cc: pravin shelar
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.Signed-off-by: Thadeu Lima de Souza Cascardo
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
20 Oct, 2015
1 commit
-
Conflicts:
drivers/net/usb/asix_common.c
net/ipv4/inet_connection_sock.c
net/switchdev/switchdev.cIn the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.The other two conflicts were overlapping changes.
Signed-off-by: David S. Miller
07 Oct, 2015
1 commit
-
Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).The new field was added to a hole in sw_flow_key.
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
05 Oct, 2015
1 commit
-
When openvswitch tries allocate memory from offline numa node 0:
stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
[ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
This patch disables numa affinity in this case.Signed-off-by: Konstantin Khlebnikov
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
23 Sep, 2015
1 commit
-
When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
21 Aug, 2015
1 commit
-
Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
newly introduced padding after the IPv4 addresses needs to be zeroed out.Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
23 Jul, 2015
1 commit
-
Conflicts:
net/bridge/br_mdb.cbr_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.Signed-off-by: David S. Miller
22 Jul, 2015
2 commits
-
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861Signed-off-by: Chris J Arges
Acked-by: Pravin B Shelar
Acked-by: Nishanth Aravamudan
Signed-off-by: David S. Miller -
Utilize the new metadata dst to attach encapsulation instructions to
the skb. The existing egress_tun_info via the OVS_CB() is left in
place until all tunnel vports have been converted to the new method.Signed-off-by: Thomas Graf
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller
08 Feb, 2015
1 commit
-
Flow alloc needs to initialize unmasked key pointer. Otherwise
it can crash kernel trying to free random unmasked-key pointer.general protection fault: 0000 [#1] SMP
3.19.0-rc6-net-next+ #457
Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008
RIP: 0010:[] [] kfree+0xac/0x196
Call Trace:
[] flow_free+0x21/0x59 [openvswitch]
[] ovs_flow_free+0x21/0x23 [openvswitch]
[] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
[] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
[] ? nla_parse+0x4f/0xec
[] genl_family_rcv_msg+0x26d/0x2c9
[] ? __lock_acquire+0x90e/0x9aa
[] genl_rcv_msg+0x66/0x89
[] ? genl_family_rcv_msg+0x2c9/0x2c9
[] netlink_rcv_skb+0x3e/0x95
[] ? genl_rcv+0x18/0x37
[] genl_rcv+0x27/0x37
[] netlink_unicast+0x103/0x191
[] netlink_sendmsg+0x2c1/0x310
[] ? might_fault+0x50/0xa0
[] do_sock_sendmsg+0x5f/0x7a
[] sock_sendmsg+0xb/0xd
[] ___sys_sendmsg+0x1a3/0x218
[] ? get_close_on_exec+0x86/0x86
[] ? fsnotify+0x32c/0x348
[] ? fsnotify+0x7c/0x348
[] ? __fget+0xaa/0xbf
[] ? get_close_on_exec+0x86/0x86
[] __sys_sendmsg+0x3d/0x5e
[] SyS_sendmsg+0x14/0x16
[] system_call_fastpath+0x12/0x17Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
CC: Joe Stringer
Reported-by: Or Gerlitz
Signed-off-by: Pravin B Shelar
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Jan, 2015
3 commits
-
Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
These minor tidyups make a future patch a little tidier.
Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
This tidies up a future patch.Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
11 Dec, 2014
1 commit
-
This patch effectively reverts commit 500f80872645 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd565553 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.[1] https://patchwork.ozlabs.org/patch/299369/
[2] https://patchwork.ozlabs.org/patch/418756/Cc: Neil Brown
Cc: Francesco Fusco
Cc: Jesse Gross
Cc: Thomas Graf
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller
10 Nov, 2014
1 commit
-
Help produce better optimized code.
Signed-off-by: Thomas Graf
Signed-off-by: Pravin B Shelar