26 Apr, 2020
3 commits
-
Compile the kernel for arm 32 platform, the build warning found.
To fix that, should use div_u64() for divisions.
| net/openvswitch/meter.c:396: undefined reference to `__udivdi3'[add more commit msg, change reported tag, and use div_u64 instead
of do_div by Tonghao]Fixes: e57358873bb5d6ca ("net: openvswitch: use u64 for meter bucket")
Reported-by: kbuild test robot
Signed-off-by: Tonghao Zhang
Tested-by: Tonghao Zhang
Signed-off-by: David S. Miller -
To fix the following sparse warning:
| net/openvswitch/meter.c:109:38: sparse: sparse: incorrect type
| in assignment (different address spaces) ...
| net/openvswitch/meter.c:720:45: sparse: sparse: incorrect type
| in argument 1 (different address spaces) ...Reported-by: kbuild test robot
Signed-off-by: Tonghao Zhang
Signed-off-by: David S. Miller -
Simple overlapping changes to linux/vermagic.h
Signed-off-by: David S. Miller
24 Apr, 2020
5 commits
-
When setting the meter rate to 4+Gbps, there is an
overflow, the meters don't work as expected.Cc: Pravin B Shelar
Cc: Andy Zhou
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Cc: Pravin B Shelar
Cc: Andy Zhou
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Before invoking the ovs_meter_cmd_reply_stats, "meter"
was checked, so don't check it agin in that function.Cc: Pravin B Shelar
Cc: Andy Zhou
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Don't allow user to create meter unlimitedly, which may cause
to consume a large amount of kernel memory. The max number
supported is decided by physical memory and 20K meters as default.Cc: Pravin B Shelar
Cc: Andy Zhou
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
In kernel datapath of Open vSwitch, there are only 1024
buckets of meter in one datapath. If installing more than
1024 (e.g. 8192) meters, it may lead to the performance drop.
But in some case, for example, Open vSwitch used as edge
gateway, there should be 20K at least, where meters used for
IP address bandwidth limitation.[Open vSwitch userspace datapath has this issue too.]
For more scalable meter, this patch use meter array instead of
hash tables, and expand/shrink the array when necessary. So we
can install more meters than before in the datapath.
Introducing the struct *dp_meter_instance, it's easy to
expand meter though changing the *ti point in the struct
*dp_meter_table.Cc: Pravin B Shelar
Cc: Andy Zhou
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
21 Apr, 2020
1 commit
-
syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_threadTo avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar
Cc: Yi-Hung Wei
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
03 Apr, 2020
1 commit
-
The struct sw_flow is protected by RCU, when traversing them,
use hlist_for_each_entry_rcu.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Reviewed-by: Greg Rose
Signed-off-by: David S. Miller
30 Mar, 2020
1 commit
-
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Cambda Zhu
Signed-off-by: David S. Miller
13 Mar, 2020
1 commit
-
Minor overlapping changes, nothing serious.
Signed-off-by: David S. Miller
04 Mar, 2020
1 commit
-
Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski
Reviewed-by: Greg Rose
Signed-off-by: David S. Miller
22 Feb, 2020
1 commit
-
Conflict resolution of ice_virtchnl_pf.c based upon work by
Stephen Rothwell.Signed-off-by: David S. Miller
21 Feb, 2020
1 commit
-
Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.net/openvswitch/flow_netlink.c: In function ‘validate_set’:
net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable]
2711 | const struct ovs_key_ipv4 *ipv4_key;
| ^~~~~~~~[1] https://bugs.llvm.org/show_bug.cgi?id=44916
Signed-off-by: Kees Cook
Signed-off-by: David S. Miller
19 Feb, 2020
4 commits
-
hlist_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.Signed-off-by: Madhuparna Bhowmik
Signed-off-by: David S. Miller -
hlist_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.Signed-off-by: Madhuparna Bhowmik
Signed-off-by: David S. Miller -
hlist_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.Signed-off-by: Madhuparna Bhowmik
Signed-off-by: David S. Miller -
hlist_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.Signed-off-by: Madhuparna Bhowmik
Signed-off-by: David S. Miller
17 Feb, 2020
1 commit
-
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
Tested with a corresponding change in the userspace:
# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#Co-developed-by: Bindiya Kurle
Signed-off-by: Bindiya Kurle
Signed-off-by: Matteo Croce
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
15 Jan, 2020
1 commit
-
This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.Signed-off-by: Jason A. Donenfeld
Signed-off-by: David S. Miller
25 Dec, 2019
1 commit
-
The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.POP_MPLS action is extended to support ethertype 0x6558.
Signed-off-by: Martin Varghese
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
10 Dec, 2019
1 commit
-
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().This patch is generated using following script:
EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"
git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
doif [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
doneSigned-off-by: Pankaj Bharadiya
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Acked-by: David Miller # for net
05 Dec, 2019
2 commits
-
The skb_mpls_push was not updating ethertype of an ethernet packet if
the packet was originally received from a non ARPHRD_ETHER device.In the below OVS data path flow, since the device corresponding to
port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
not update the ethertype of the packet even though the previous
push_eth action had added an ethernet header to the packet.recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese
Signed-off-by: David S. Miller -
The openvswitch module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The openvswitch module doesn't have such capability.Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Aaron Conole
Signed-off-by: David S. Miller
03 Dec, 2019
1 commit
-
The skb_mpls_pop was not updating ethertype of an ethernet packet if the
packet was originally received from a non ARPHRD_ETHER device.In the below OVS data path flow, since the device corresponding to port 7
is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
the ethertype of the packet even though the previous push_eth action had
added an ethernet header to the packet.recirc_id(0),in_port(7),eth_type(0x8847),
mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
pop_mpls(eth_type=0x800),4Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
02 Dec, 2019
2 commits
-
If we can't build the flow del notification, we can simply delete
the flow, no need to crash the kernel. Still keep a WARN_ON to
preserve debuggability.Note: the BUG_ON() predates the Fixes tag, but this change
can be applied only after the mentioned commit.v1 -> v2:
- do not leak an skb on errorFixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller -
All the callers of ovs_flow_cmd_build_info() already deal with
error return code correctly, so we can handle the error condition
in a more gracefull way. Still dump a warning to preserve
debuggability.v1 -> v2:
- clarify the commit message
- clean the skb and report the error (DaveM)Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
27 Nov, 2019
1 commit
-
When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
flow has no UFID, we can exceed the computed size, as
ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
attribute.
Take the above in account when computing the flow command message
size.Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.")
Reported-by: Qi Jun Ding
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
16 Nov, 2019
1 commit
-
The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:nla_put_u16/nla_put_u32
-> nla_put attrlen = sizeof(u16) or sizeof(u32)
-> __nla_put attrlen
-> __nla_reserve attrlen
-> skb_put(skb, nla_total_size(attrlen))nla_total_size returns the total length of attribute
including padding.Cc: Joe Stringer
Cc: William Tu
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
15 Nov, 2019
1 commit
-
When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.Hash of TCP socket is computed:
tcp_v4_connect
-> sk_set_txhash (is random)__tcp_transmit_skb
-> skb_set_hash_from_skThere will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.+---------------+ +-------------------------+
| Docker/VMs | | ovs-vswitchd |
+----+----------+ +-+--------------------+--+
| ^ |
| | |
| | upcall v restore packet hash (not recalculate)
| +-+--------------------+--+
| tap netdev | | vxlan module
+---------------> +--> Open vSwitch ko +-->
or internal type | |
+-------------------------+Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
07 Nov, 2019
1 commit
-
The commit 69c51582ff786 ("dpif-netlink: don't allocate per
thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
changed the number of allocated sockets to just one per port
by moving the socket array from a per handler structure to
a per datapath one. In the kernel datapath, a vport will have
only one socket in most case, if so select it directly in
fast-path.Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
06 Nov, 2019
1 commit
-
The openvswitch was supporting a MPLS label depth of 1 in the ingress
direction though the userspace OVS supports a max depth of 3 labels.
This change enables openvswitch module to support a max depth of
3 labels in the ingress.Signed-off-by: Martin Varghese
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
04 Nov, 2019
7 commits
-
use the specified functions to init resource.
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
The most case *index < ma->max, and flow-mask is not NULL.
We add un/likely for performance.Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Simplify the code and remove the unnecessary BUILD_BUG_ON.
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: William Tu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.One bad case, for example: M means flow-mask is valid and NULL
of flow-mask means deleted.+-------------------------------------------+
| M | NULL | ... | NULL | M|
+-------------------------------------------+In that case, without this patch, openvswitch will traverses all
mask array, because there will be one flow-mask in the tail. This
patch changes the way of flow-mask inserting and deleting, and the
mask array will be keep as below: there is not a NULL hole. In the
fast path, we can "break" "for" (not "continue") in flow_lookup
when we get a NULL flow-mask."break"
v
+-------------------------------------------+
| M | M | NULL |... | NULL | NULL|
+-------------------------------------------+This patch don't optimize slow or control path, still using ma->max
to traverse. Slow path:
* tbl_mask_array_realloc
* ovs_flow_tbl_lookup_exact
* flow_mask_findSigned-off-by: Tonghao Zhang
Tested-by: Greg Rose
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
Signed-off-by: Tonghao Zhang
Tested-by: Greg Rose
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller