08 Sep, 2022
1 commit
-
[ Upstream commit a87406f4adee9c53b311d8a1ba2849c69e29a6d0 ]
ovs_dp_cmd_new()->ovs_dp_change()->ovs_dp_set_upcall_portids()
allocates array via kmalloc.
If for some reason new_vport() fails during ovs_dp_cmd_new()
dp->upcall_portids must be freed.
Add missing kfree.Kmemleak example:
unreferenced object 0xffff88800c382500 (size 64):
comm "dump_state", pid 323, jiffies 4294955418 (age 104.347s)
hex dump (first 32 bytes):
5e c2 79 e4 1f 7a 38 c7 09 21 38 0c 80 88 ff ff ^.y..z8..!8.....
03 00 00 00 0a 00 00 00 14 00 00 00 28 00 00 00 ............(...
backtrace:
[] ovs_dp_set_upcall_portids+0x38/0xa0
[] ovs_dp_change+0x63/0xe0
[] ovs_dp_cmd_new+0x1f0/0x380
[] genl_family_rcv_msg_doit+0xea/0x150
[] genl_rcv_msg+0xdc/0x1e0
[] netlink_rcv_skb+0x50/0x100
[] genl_rcv+0x24/0x40
[] netlink_unicast+0x23e/0x360
[] netlink_sendmsg+0x24e/0x4b0
[] sock_sendmsg+0x62/0x70
[] ____sys_sendmsg+0x230/0x270
[] ___sys_sendmsg+0x88/0xd0
[] __sys_sendmsg+0x59/0xa0
[] do_syscall_64+0x3b/0x90
[] entry_SYSCALL_64_after_hwframe+0x63/0xcdFixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch")
Acked-by: Aaron Conole
Signed-off-by: Andrey Zhadchenko
Link: https://lore.kernel.org/r/20220825020326.664073-1-andrey.zhadchenko@virtuozzo.com
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin
29 Jun, 2022
1 commit
-
commit 12378a5a75e33f34f8586706eb61cca9e6d4690c upstream.
When a packet enters the OVS datapath and does not match any existing
flows installed in the kernel flow cache, the packet will be sent to
userspace to be parsed, and a new flow will be created. The kernel and
OVS rely on each other to parse packet fields in the same way so that
packets will be handled properly.As per the design document linked below, OVS expects all later IPv6
fragments to have nw_proto=44 in the flow key, so they can be correctly
matched on OpenFlow rules. OpenFlow controllers create pipelines based
on this design.This behavior was changed by the commit in the Fixes tag so that
nw_proto equals the next_header field of the last extension header.
However, there is no counterpart for this change in OVS userspace,
meaning that this field is parsed differently between OVS and the
kernel. This is a problem because OVS creates actions based on what is
parsed in userspace, but the kernel-provided flow key is used as a match
criteria, as described in Documentation/networking/openvswitch.rst. This
leads to issues such as packets incorrectly matching on a flow and thus
the wrong list of actions being applied to the packet. Such changes in
packet parsing cannot be implemented without breaking the userspace.The offending commit is partially reverted to restore the expected
behavior.The change technically made sense and there is a good reason that it was
implemented, but it does not comply with the original design of OVS.
If in the future someone wants to implement such a change, then it must
be user-configurable and disabled by default to preserve backwards
compatibility with existing OVS versions.Cc: stable@vger.kernel.org
Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags")
Link: https://docs.openvswitch.org/en/latest/topics/design/#fragments
Signed-off-by: Rosemarie O'Riorden
Acked-by: Eelco Chaudron
Link: https://lore.kernel.org/r/20220621204845.9721-1-roriorden@redhat.com
Signed-off-by: Paolo Abeni
Signed-off-by: Greg Kroah-Hartman
15 Jun, 2022
1 commit
-
commit 2061ecfdf2350994e5b61c43e50e98a7a70e95ee upstream.
If packet headers changed, the cached nfct is no longer relevant
for the packet and attempt to re-use it leads to the incorrect packet
classification.This issue is causing broken connectivity in OpenStack deployments
with OVS/OVN due to hairpin traffic being unexpectedly dropped.The setup has datapath flows with several conntrack actions and tuple
changes between them:actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
ct(zone=8),recirc(0x4)After the first ct() action the packet headers are almost fully
re-written. The next ct() tries to re-use the existing nfct entry
and marks the packet as invalid, so it gets dropped later in the
pipeline.Clearing the cached conntrack entry whenever packet tuple is changed
to avoid the issue.The flow key should not be cleared though, because we should still
be able to match on the ct_state if the recirculation happens after
the tuple change but before the next ct() action.Cc: stable@vger.kernel.org
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Reported-by: Frode Nordahl
Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html
Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
Signed-off-by: Ilya Maximets
Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman
27 Apr, 2022
2 commits
-
commit 719774377622bc4025d2a74f551b5dc2158c6c30 upstream.
Convert nf_conn reference counting from atomic_t to refcount_t based api.
refcount_t api provides more runtime sanity checks and will warn on
certain constructs, e.g. refcount_inc() on a zero reference count, which
usually indicates use-after-free.For this reason template allocation is changed to init the refcount to
1, the subsequenct add operations are removed.Likewise, init_conntrack() is changed to set the initial refcount to 1
instead refcount_inc().This is safe because the new entry is not (yet) visible to other cpus.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman -
commit cefa91b2332d7009bc0be5d951d6cbbf349f90f8 upstream.
Given a sufficiently large number of actions, while copying and
reserving memory for a new action of a new flow, if next_offset is
greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
bytes increasing actions_len by req_size. This can then lead to an OOB
write access, especially when further actions need to be copied.Fix it by rearranging the flow action size check.
KASAN splat below:
==================================================================
BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
Write of size 65360 at addr ffff888147e4001c by task handler15/836CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
...
Call Trace:
dump_stack_lvl+0x45/0x5a
print_report.cold+0x5e/0x5db
? __lock_text_start+0x8/0x8
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_report+0xb5/0x130
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_check_range+0xf5/0x1d0
memcpy+0x39/0x60
reserve_sfa_size+0x1ba/0x380 [openvswitch]
__add_action+0x24/0x120 [openvswitch]
ovs_nla_add_action+0xe/0x20 [openvswitch]
ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
? __kernel_text_address+0xe/0x30
? unwind_get_return_address+0x56/0xa0
? create_prof_cpu_mask+0x20/0x20
? ovs_ct_verify+0xf0/0xf0 [openvswitch]
? prep_compound_page+0x198/0x2a0
? __kasan_check_byte+0x10/0x40
? kasan_unpoison+0x40/0x70
? ksize+0x44/0x60
? reserve_sfa_size+0x75/0x380 [openvswitch]
__ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
? __zone_watermark_ok+0x420/0x420
? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
? __alloc_pages+0x1a9/0x3e0
? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
? unwind_next_frame+0x991/0x1e40
? __mod_node_page_state+0x99/0x120
? __mod_lruvec_page_state+0x2e3/0x470
? __kasan_kmalloc_large+0x90/0xe0
ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
...Cc: stable@vger.kernel.org
Fixes: f28cd2af22a0 ("openvswitch: fix flow actions reallocation")
Signed-off-by: Paolo Valerio
Acked-by: Eelco Chaudron
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
14 Apr, 2022
2 commits
-
[ Upstream commit 1f30fb9166d4f15a1aa19449b9da871fe0ed4796 ]
While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.Currently there are only two such actions: ct() and set(). However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case). The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.New build time assertion should protect us from this problem if new
actions will be added in the future.Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked. sample() and clone() actions
are mixing extra attributes into the user-provided action list. That
prevents some code generalization too.Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst")
Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html
Reported-by: Stéphane Graber
Signed-off-by: Ilya Maximets
Acked-by: Aaron Conole
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 3f2a3050b4a3e7f32fc0ea3c9b0183090ae00522 ]
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel. It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump. However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:# ovs-dpctl add-dp system@ovs-system
# ovs-dpctl add-flow "" "clone(ct(commit),0)"
# ovs-dpctl dump-flows
, packets:0, bytes:0, used:never,
actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
ct(commit),0)With the fix:
# ovs-dpctl dump-flows
, packets:0, bytes:0, used:never,
actions:clone(ct(commit),0)Additionally fixed an incorrect attribute name in the comment.
Fixes: b233504033db ("openvswitch: kernel datapath clone action")
Signed-off-by: Ilya Maximets
Acked-by: Aaron Conole
Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin
08 Apr, 2022
3 commits
-
commit f19c44452b58a84d95e209b847f5495d91c9983a upstream.
IPv6 nd target mask was not getting populated in flow dump.
In the function __ovs_nla_put_key the icmp code mask field was checked
instead of icmp code key field to classify the flow as neighbour discovery.ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0),
skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0),
ct_zone(0/0),ct_mark(0/0),ct_label(0/0),
eth(src=00:00:00:00:00:00/00:00:00:00:00:00,
dst=00:00:00:00:00:00/00:00:00:00:00:00),
eth_type(0x86dd),
ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no),
icmpv6(type=135,code=0),
nd(target=2001::2/::,
sll=00:00:00:00:00:00/00:00:00:00:00:00,
tll=00:00:00:00:00:00/00:00:00:00:00:00),
packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c)
Signed-off-by: Martin Varghese
Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com
Signed-off-by: Paolo Abeni
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 408bdcfce8dfd6902f75fbcd3b99d8b24b506597 ]
Its the same as nf_conntrack_put(), but without the
need for an indirect call. The downside is a module dependency on
nf_conntrack, but all of these already depend on conntrack anyway.Cc: Paul Blakey
Cc: dev@openvswitch.org
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin -
[ Upstream commit 60b44ca6bd7518dd38fa2719bc9240378b6172c3 ]
During NAT, a tuple collision may occur. When this happens, openvswitch
will make a second pass through NAT which will perform additional packet
modification. This will update the skb data, but not the flow key that
OVS uses. This means that future flow lookups, and packet matches will
have incorrect data. This has been supported since
5d50aa83e2c8 ("openvswitch: support asymmetric conntrack").That commit failed to properly update the sw_flow_key attributes, since
it only called the ovs_ct_nat_update_key once, rather than each time
ovs_ct_nat_execute was called. As these two operations are linked, the
ovs_ct_nat_execute() function should always make sure that the
sw_flow_key is updated after a successful call through NAT infrastructure.Fixes: 5d50aa83e2c8 ("openvswitch: support asymmetric conntrack")
Cc: Dumitru Ceara
Cc: Numan Siddique
Signed-off-by: Aaron Conole
Acked-by: Eelco Chaudron
Link: https://lore.kernel.org/r/20220318124319.3056455-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin
02 Mar, 2022
1 commit
-
commit d9b5ae5c1b241b91480aa30408be12fe91af834a upstream.
Ipv6 ttl, label and tos fields are modified without first
pulling/pushing the ipv6 header, which would have updated
the hw csum (if available). This might cause csum validation
when sending the packet to the stack, as can be seen in
the trace below.Fix this by updating skb->csum if available.
Trace resulted by ipv6 ttl dec and then sending packet
to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
[295241.900063] s_pf0vf2: hw csum failure
[295241.923191] Call Trace:
[295241.925728]
[295241.927836] dump_stack+0x5c/0x80
[295241.931240] __skb_checksum_complete+0xac/0xc0
[295241.935778] nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
[295241.953030] nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
[295241.958344] __ovs_ct_lookup+0xac/0x860 [openvswitch]
[295241.968532] ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
[295241.979167] do_execute_actions+0x54a/0xaa0 [openvswitch]
[295242.001482] ovs_execute_actions+0x48/0x100 [openvswitch]
[295242.006966] ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
[295242.012626] ovs_vport_receive+0x6c/0xc0 [openvswitch]
[295242.028763] netdev_frame_hook+0xc0/0x180 [openvswitch]
[295242.034074] __netif_receive_skb_core+0x2ca/0xcb0
[295242.047498] netif_receive_skb_internal+0x3e/0xc0
[295242.052291] napi_gro_receive+0xba/0xe0
[295242.056231] mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
[295242.062513] mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
[295242.067669] mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
[295242.077958] net_rx_action+0x149/0x3b0
[295242.086762] __do_softirq+0xd7/0x2d6
[295242.090427] irq_exit+0xf7/0x100
[295242.093748] do_IRQ+0x7f/0xd0
[295242.096806] common_interrupt+0xf/0xf
[295242.100559]
[295242.102750] RIP: 0033:0x7f9022e88cbd
[295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
[295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
[295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
[295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
[295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action")
Signed-off-by: Paul Blakey
Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman
27 Jan, 2022
2 commits
-
[ Upstream commit 6f022c2ddbcefaee79502ce5386dfe351d457070 ]
Netfilter conntrack maintains NAT flags per connection indicating
whether NAT was configured for the connection. Openvswitch maintains
NAT flags on the per packet flow key ct_state field, indicating
whether NAT was actually executed on the packet.When a packet misses from tc to ovs the conntrack NAT flags are set.
However, NAT was not necessarily executed on the packet because the
connection's state might still be in NEW state. As such, openvswitch
wrongly assumes that NAT was executed and sets an incorrect flow key
NAT flags.Fix this, by flagging to openvswitch which NAT was actually done in
act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so
the packet flow key NAT flags will be correctly set.Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Paul Blakey
Acked-by: Jamal Hadi Salim
Link: https://lore.kernel.org/r/20220106153804.26451-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin -
[ Upstream commit 635d448a1cce4b4ebee52b351052c70434fa90ea ]
Zone id is not restored if we passed ct and ct rejected the connection,
as there is no ct info on the skb.Save the zone from tc skb cb to tc skb extension and pass it on to
ovs, use that info to restore the zone id for invalid connections.Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
Signed-off-by: Paul Blakey
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin
20 Aug, 2021
1 commit
-
drivers/ptp/Kconfig:
55c8fca1dae1 ("ptp_pch: Restore dependency on PCI")
e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies")Signed-off-by: Jakub Kicinski
18 Aug, 2021
1 commit
-
fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
doesn't clear skb->tstamp. We encountered a problem with linux
version 5.4.56 and ovs version 2.14.1, and packets failed to
dequeue from qdisc when fq qdisc was attached to ovs port.Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: kaixi.fan
Signed-off-by: xiexiaohui
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller
13 Aug, 2021
1 commit
-
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
099fdeda659d ("bnxt_en: Event handler for PPS events")kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")MAINTAINERS
7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")Signed-off-by: Jakub Kicinski
10 Aug, 2021
1 commit
-
Repair kernel-doc notation in a few places to make it conform to
the expected format.Fixes the following kernel-doc warnings:
flow.c:296: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Parse vlan tag from vlan header.
flow.c:296: warning: missing initial short description on line:
* Parse vlan tag from vlan header.
flow.c:537: warning: No description found for return value of 'key_extract_l3l4'
flow.c:769: warning: No description found for return value of 'key_extract'Signed-off-by: Randy Dunlap
Cc: Pravin B Shelar
Cc: dev@openvswitch.org
Link: https://lore.kernel.org/r/20210808190834.23362-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski
27 Jul, 2021
2 commits
-
fix incorrect type in argument 1 (different address spaces)
../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces)
../net/openvswitch/datapath.c:169:17: expected void const *
../net/openvswitch/datapath.c:169:17: got struct dp_nlsk_pids [noderef] __rcu *upcall_portidsFound at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159
Signed-off-by: Mark Gray
Signed-off-by: David S. Miller -
Signed-off-by: Mark Gray
Signed-off-by: David S. Miller
17 Jul, 2021
1 commit
-
The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:* On systems with a large number of vports, there is a correspondingly
large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.The corresponding user space code can be found at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.htmlBugzilla: https://bugzilla.redhat.com/1844576
Signed-off-by: Mark Gray
Acked-by: Flavio Leitner
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
02 Jul, 2021
1 commit
-
In the current implement when comparing two flow keys, we will return
result after comparing the whole key from start to end.In our optimization, we will return result in the first none-zero
comparison, then we will improve the flow table looking up efficiency.Signed-off-by: Baowen Zheng
Signed-off-by: Louis Peens
Signed-off-by: Simon Horman
Signed-off-by: David S. Miller
23 Jun, 2021
1 commit
-
This makes openvswitch module use the event tracing framework
to log the upcall interface and action execution pipeline. When
using openvswitch as the packet forwarding engine, some types of
debugging are made possible simply by using the ovs-vswitchd's
ofproto/trace command. However, such a command has some
limitations:1. When trying to trace packets that go through the CT action,
the state of the packet can't be determined, and probably
would be potentially wrong.2. Deducing problem packets can sometimes be difficult as well
even if many of the flows are known3. It's possible to use the openvswitch module even without
the ovs-vswitchd (although, not common use).Introduce the event tracing points here to make it possible for
working through these problems in kernel space. The style is
copied from the mac80211 driver-trace / trace code for
consistency - this creates some checkpatch splats, but the
official 'guide' for adding tracepoints, as well as the existing
examples all add the same splats so it seems acceptable.Signed-off-by: Aaron Conole
Signed-off-by: David S. Miller
28 May, 2021
1 commit
-
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build
Signed-off-by: Jakub Kicinski
14 May, 2021
1 commit
-
We have observed meters working unexpected if traffic is 3+Gbit/s
with multiple connections.now_ms is not pretected by meter->lock, we may get a negative
long_delta_ms when another cpu updated meter->used, then:
delta_ms = (u32)long_delta_ms;
which will be a large value.band->bucket += delta_ms * band->rate;
then we get a wrong band->bucket.OpenVswitch userspace datapath has fixed the same issue[1] some
time ago, and we port the implementation to kernel datapath.[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20191025114436.9746-1-i.maximets@ovn.org/
Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Tao Liu
Suggested-by: Ilya Maximets
Reviewed-by: Ilya Maximets
Signed-off-by: David S. Miller
11 May, 2021
1 commit
-
There is no need add 'if (skb_nfct(skb))' assignment, the
nf_conntrack_put() would check it.Signed-off-by: Yejune Deng
Signed-off-by: David S. Miller
30 Apr, 2021
1 commit
-
running openvswitch on kernels built with KASAN, it's possible to see the
following splat while testing fragmentation of IPv4 packets:BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888112fc713c by task handler2/1367CPU: 0 PID: 1367 Comm: handler2 Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
ovs_fragment+0x5bf/0x840 [openvswitch]
do_execute_actions+0x1bd5/0x2400 [openvswitch]
ovs_execute_actions+0xc8/0x3d0 [openvswitch]
ovs_packet_cmd_execute+0xa39/0x1150 [openvswitch]
genl_family_rcv_msg_doit.isra.15+0x227/0x2d0
genl_rcv_msg+0x287/0x490
netlink_rcv_skb+0x120/0x380
genl_rcv+0x24/0x40
netlink_unicast+0x439/0x630
netlink_sendmsg+0x719/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5ba/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f957079db07
Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 eb ec ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 24 ed ff ff 48
RSP: 002b:00007f956ce35a50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f957079db07
RDX: 0000000000000000 RSI: 00007f956ce35ae0 RDI: 0000000000000019
RBP: 00007f956ce35ae0 R08: 0000000000000000 R09: 00007f9558006730
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 00007f956ce37308 R14: 00007f956ce35f80 R15: 00007f956ce35ae0The buggy address belongs to the page:
page:00000000af2a1d93 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112fc7
flags: 0x17ffffc0000000()
raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detectedaddr ffff888112fc713c is located in stack of task handler2/1367 at offset 180 in frame:
ovs_fragment+0x0/0x840 [openvswitch]this frame has 2 objects:
[32, 144) 'ovs_dst'
[192, 424) 'ovs_rt'Memory state around the buggy address:
ffff888112fc7000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888112fc7080: 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00
>ffff888112fc7100: 00 00 00 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
^
ffff888112fc7180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888112fc7200: 00 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00for IPv4 packets, ovs_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in ovs_fragment(), similarly to what is done for IPv6 few lines below.Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt")
Cc:
Acked-by: Eelco Chaudron
Signed-off-by: Davide Caratti
Signed-off-by: David S. Miller
27 Apr, 2021
1 commit
24 Apr, 2021
1 commit
-
Implementation of meters supposed to be a classic token bucket with 2
typical parameters: rate and burst size.Burst size in this schema is the maximum number of bytes/packets that
could pass without being rate limited.Recent changes to userspace datapath made meter implementation to be
in line with the kernel one, and this uncovered several issues.The main problem is that maximum bucket size for unknown reason
accounts not only burst size, but also the numerical value of rate.
This creates a lot of confusion around behavior of meters.For example, if rate is configured as 1000 pps and burst size set to 1,
this should mean that meter will tolerate bursts of 1 packet at most,
i.e. not a single packet above the rate should pass the meter.
However, current implementation calculates maximum bucket size as
(rate + burst size), so the effective bucket size will be 1001. This
means that first 1000 packets will not be rate limited and average
rate might be twice as high as the configured rate. This also makes
it practically impossible to configure meter that will have burst size
lower than the rate, which might be a desirable configuration if the
rate is high.Inability to configure low values of a burst size and overall inability
for a user to predict what will be a maximum and average rate from the
configured parameters of a meter without looking at the OVS and kernel
code might be also classified as a security issue, because drop meters
are frequently used as a way of protection from DoS attacks.This change removes rate from the calculation of a bucket size, making
it in line with the classic token bucket algorithm and essentially
making the rate and burst tolerance being predictable from a users'
perspective.Same change proposed for the userspace implementation.
Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Ilya Maximets
Signed-off-by: David S. Miller
10 Apr, 2021
1 commit
-
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivialSigned-off-by: Jakub Kicinski
06 Apr, 2021
1 commit
-
'struct ovs_zone_limit' has more members than initialized in
ovs_ct_limit_get_default_limit(). The rest of the memory is a random
kernel stack content that ends up being sent to userspace.Fix that by using designated initializer that will clear all
non-specified fields.Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Ilya Maximets
Acked-by: Tonghao Zhang
Signed-off-by: David S. Miller
04 Apr, 2021
1 commit
-
'skb_push()'/'skb_postpush_rcsum()' can be replaced by an equivalent
'skb_push_rcsum()' which is less verbose.Signed-off-by: Christophe JAILLET
Signed-off-by: David S. Miller
26 Mar, 2021
1 commit
-
Signed-off-by: David S. Miller
23 Mar, 2021
1 commit
-
s/subsytem/subsystem/
Signed-off-by: Bhaskar Chowdhury
Acked-by: Randy Dunlap
Signed-off-by: David S. Miller
17 Mar, 2021
2 commits
-
It is not unusual to have the bridge port down. Sometimes
it has the old MTU, which is fine since it's not being used.However, the kernel spams the log with a warning message
when a packet is going to be sent over such port. Fix that
by warning only if the interface is UP.Signed-off-by: Flavio Leitner
Signed-off-by: David S. Miller -
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu
Reviewed-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
15 Jan, 2021
1 commit
-
As requested by upstream OVS, added some error messages in the
validate_and_copy_dec_ttl function.Includes a small cleanup, which removes an unnecessary parameter
from the dec_ttl_exception_handler() function.Reported-by: Flavio Leitner
Signed-off-by: Eelco Chaudron
Acked-by: Flavio Leitner
Link: https://lore.kernel.org/r/161054576573.26637.18396634650212670580.stgit@ebuild
Signed-off-by: Jakub Kicinski
15 Dec, 2020
1 commit
-
Currently, the exception actions are not processed correctly as the wrong
dataset is passed. This change fixes this, including the misleading
comment.In addition, a check was added to make sure we work on an IPv4 packet,
and not just assume if it's not IPv6 it's IPv4.This was all tested using OVS with patch,
https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
applied and sending packets with a TTL of 1 (and 0), both with IPv4
and IPv6.Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format")
Signed-off-by: Eelco Chaudron
Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski
12 Dec, 2020
1 commit
-
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.Conflicts:
net/netfilter/nf_tables_api.cSigned-off-by: Jakub Kicinski
09 Dec, 2020
1 commit
-
Simplify the return expression.
Signed-off-by: Zheng Yongjun
Reviewed-by: Eelco Chaudron
Signed-off-by: David S. Miller
05 Dec, 2020
1 commit
-
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.Changing 'return start' to 'return action_start' can fix this bug.
Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format")
Reported-by: Hulk Robot
Signed-off-by: Wang Hai
Reviewed-by: Eelco Chaudron
Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski