Eric Lee / smarc-fsl-linux-kernel

06 Sep, 2019

1 commit

95a7233c4 net: openvswitch: Set OvS recirc_id from tc chain index ... Browse Code »

Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:

recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

Will be translated to the following tc rule:

$ tc filter add dev dev1 ingress \
prio 1 chain 0 proto ip \
flower tcp ct_state -trk \
action ct pipe \
action goto chain 2

Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.

To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.

Signed-off-by: Paul Blakey
Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Paul Blakey
2019-09-06 20:59:18 +0800

29 Aug, 2019

2 commits

0754b4e8c openvswitch: Clear the L4 portion of the key for "later" fragments. ... Browse Code »

Only the first fragment in a datagram contains the L4 headers. When the
Open vSwitch module parses a packet, it always sets the IP protocol
field in the key, but can only set the L4 fields on the first fragment.
The original behavior would not clear the L4 portion of the key, so
garbage values would be sent in the key for "later" fragments. This
patch clears the L4 fields in that circumstance to prevent sending those
garbage values as part of the upcall.

Signed-off-by: Justin Pettit
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Justin Pettit
2019-08-29 05:53:51 +0800
ad06a566e openvswitch: Properly set L4 keys on "later" IP fragments ... Browse Code »

When IP fragments are reassembled before being sent to conntrack, the
key from the last fragment is used. Unless there are reordering
issues, the last fragment received will not contain the L4 ports, so the
key for the reassembled datagram won't contain them. This patch updates
the key once we have a reassembled datagram.

The handle_fragments() function works on L3 headers so we pull the L3/L4
flow key update code from key_extract into a new function
'key_extract_l3l4'. Then we add a another new function
ovs_flow_key_update_l3l4() and export it so that it is accessible by
handle_fragments() for conntrack packet reassembly.

Co-authored-by: Justin Pettit
Signed-off-by: Greg Rose
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Greg Rose
2019-08-29 05:53:51 +0800

20 Jul, 2019

1 commit

aef833c58 net: openvswitch: rename flow_stats to sw_flow_stats ... Browse Code »

There is a flow_stats structure defined in include/net/flow_offload.h
and a follow up patch adds #include to
net/sch_generic.h.

This breaks compilation since OVS codebase includes net/sock.h which
pulls in linux/filter.h which includes net/sch_generic.h.

In file included from ./include/net/sch_generic.h:18:0,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:59,
from ./include/linux/tcp.h:19,
from net/openvswitch/datapath.c:24

This definition takes precedence on OVS since it is placed in the
networking core, so rename flow_stats in OVS to sw_flow_stats since
this structure is contained in sw_flow.

Signed-off-by: Pablo Neira Ayuso
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-07-20 12:27:45 +0800

05 Jun, 2019

1 commit

c94229992 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 21 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Reviewed-by: Richard Fontana
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:30:29 +0800

05 Jan, 2019

1 commit

41e4e2cd7 openvswitch: Fix IPv6 later frags parsing ... Browse Code »

The previous commit fa642f08839b
("openvswitch: Derive IP protocol number for IPv6 later frags")
introduces IP protocol number parsing for IPv6 later frags that can mess
up the network header length calculation logic, i.e. nh_len < 0.
However, the network header length calculation is mainly for deriving
the transport layer header in the key extraction process which the later
fragment does not apply.

Therefore, this commit skips the network header length calculation to
fix the issue.

Reported-by: Chris Mi
Reported-by: Greg Rose
Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags")
Signed-off-by: Yi-Hung Wei
Signed-off-by: David S. Miller

Yi-Hung Wei
2019-01-05 05:00:02 +0800

11 Nov, 2018

1 commit

6083e28aa OVS: remove VLAN_TAG_PRESENT - fixup ... Browse Code »

It turns out I missed one VLAN_TAG_PRESENT in OVS code while rebasing.
This fixes it.

Fixes: 9df46aefafa6 ("OVS: remove use of VLAN_TAG_PRESENT")
Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2018-11-11 05:42:16 +0800

09 Nov, 2018

1 commit

9df46aefa OVS: remove use of VLAN_TAG_PRESENT ... Browse Code »

This is a minimal change to allow removing of VLAN_TAG_PRESENT.
It leaves OVS unable to use CFI bit, as fixing this would need
a deeper surgery involving userspace interface.

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2018-11-09 11:49:31 +0800

07 Sep, 2018

1 commit

fa642f088 openvswitch: Derive IP protocol number for IPv6 later frags ... Browse Code »

Currently, OVS only parses the IP protocol number for the first
IPv6 fragment, but sets the IP protocol number for the later fragments
to be NEXTHDF_FRAGMENT. This patch tries to derive the IP protocol
number for the IPV6 later frags so that we can match that.

Signed-off-by: Yi-Hung Wei
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Yi-Hung Wei
2018-09-07 12:47:49 +0800

23 Dec, 2017

1 commit

fba961ab2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Lots of overlapping changes. Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.

Include a necessary change by Jakub Kicinski, with log message:

====================
cls_bpf no longer takes care of offload tracking. Make sure
netdevsim performs necessary checks. This fixes a warning
caused by TC trying to remove a filter it has not added.

Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
====================

Signed-off-by: David S. Miller

David S. Miller
2017-12-23 00:16:31 +0800

22 Dec, 2017

1 commit

c48e74736 openvswitch: Fix pop_vlan action for double tagged frames ... Browse Code »

skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
shift the true ethertype into position for us.

Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver
Reviewed-by: Jiri Benc
Signed-off-by: David S. Miller

Eric Garver
2017-12-22 02:02:08 +0800

30 Nov, 2017

1 commit

311af51dc openvswitch: use ktime_get_ts64() instead of ktime_get_ts() ... Browse Code »

timespec is deprecated because of the y2038 overflow, so let's convert
this one to ktime_get_ts64(). The code is already safe even on 32-bit
architectures, since it uses monotonic times. On 64-bit architectures,
nothing changes, while on 32-bit architectures this avoids one
type conversion.

Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2017-11-30 22:26:32 +0800

24 Nov, 2017

1 commit

0c19f846d net: accept UFO datagrams from tuntap and packet ... Browse Code »

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.

A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0

guest:
dd if=/dev/zero of=payload.txt bs=1 count=2000
nc -u 192.16.1.1 8000 < payload.txt

Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
packets arriving fragmented:

./with_tap_pair.sh ./tap_send_ufo tap0 tap1
(from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
v1 -> v2
- simplified set_offload change (review comment)
- documented test procedure

Link: http://lkml.kernel.org/r/
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek
Signed-off-by: Willem de Bruijn
Acked-by: Jason Wang
Signed-off-by: David S. Miller

Willem de Bruijn
2017-11-24 00:37:35 +0800

08 Nov, 2017

1 commit

b2d0f5d5d openvswitch: enable NSH support ... Browse Code »

v16->17
- Fixed disputed check code: keep them in nsh_push and nsh_pop
but also add them in __ovs_nla_copy_actions

v15->v16
- Add csum recalculation for nsh_push, nsh_pop and set_nsh
pointed out by Pravin
- Move nsh key into the union with ipv4 and ipv6 and add
check for nsh key in match_validate pointed out by Pravin
- Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
- Check size in nsh_hdr_from_nlattr
- Fixed four small issues pointed out By Jiri and Eric

v13->v14
- Rename skb_push_nsh to nsh_push per Dave's comment
- Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
- Fix NSH header length check in set_nsh

v11->v12
- Fix missing changes old comments pointed out
- Fix new comments for v11

v10->v11
- Fix the left three disputable comments for v9
but not fixed in v10.

v9->v10
- Change struct ovs_key_nsh to
struct ovs_nsh_key_base base;
__be32 context[NSH_MD1_CONTEXT_SIZE];
- Fix new comments for v9

v8->v9
- Fix build error reported by daily intel build
because nsh module isn't selected by openvswitch

v7->v8
- Rework nested value and mask for OVS_KEY_ATTR_NSH
- Change pop_nsh to adapt to nsh kernel module
- Fix many issues per comments from Jiri Benc

v6->v7
- Remove NSH GSO patches in v6 because Jiri Benc
reworked it as another patch series and they have
been merged.
- Change it to adapt to nsh kernel module added by NSH
GSO patch series

v5->v6
- Fix the rest comments for v4.
- Add NSH GSO support for VxLAN-gpe + NSH and
Eth + NSH.

v4->v5
- Fix many comments by Jiri Benc and Eric Garver
for v4.

v3->v4
- Add new NSH match field ttl
- Update NSH header to the latest format
which will be final format and won't change
per its author's confirmation.
- Fix comments for v3.

v2->v3
- Change OVS_KEY_ATTR_NSH to nested key to handle
length-fixed attributes and length-variable
attriubte more flexibly.
- Remove struct ovs_action_push_nsh completely
- Add code to handle nested attribute for SET_MASKED
- Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
to transfer NSH header data.
- Fix comments and coding style issues by Jiri and Eric

v1->v2
- Change encap_nsh and decap_nsh to push_nsh and pop_nsh
- Dynamically allocate struct ovs_action_push_nsh for
length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang
Acked-by: Jiri Benc
Acked-by: Eric Garver
Acked-by: Pravin Shelar
Signed-off-by: David S. Miller

Yi Yang
2017-11-08 15:12:33 +0800

20 Jul, 2017

2 commits

c4b2bf6b4 openvswitch: Optimize operations for OvS flow_stats. ... Browse Code »

When calling the flow_free() to free the flow, we call many times
(cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
take up our CPU usage if we call the flow_free() frequently.
When we put all packets to userspace via upcall, and OvS will send
them back via netlink to ovs_packet_cmd_execute(will call flow_free).

The test topo is shown as below. VM01 sends TCP packets to VM02,
and OvS forward packtets. When testing, we use perf to report the
system performance.

VM01 --- OvS-VM --- VM02

Without this patch, perf-top show as below: The flow_free() is
3.02% CPU usage.

4.23% [kernel] [k] _raw_spin_unlock_irqrestore
3.62% [kernel] [k] __do_softirq
3.16% [kernel] [k] __memcpy
3.02% [kernel] [k] flow_free
2.42% libc-2.17.so [.] __memcpy_ssse3_back
2.18% [kernel] [k] copy_user_generic_unrolled
2.17% [kernel] [k] find_next_bit

When applied this patch, perf-top show as below: Not shown on
the list anymore.

4.11% [kernel] [k] _raw_spin_unlock_irqrestore
3.79% [kernel] [k] __do_softirq
3.46% [kernel] [k] __memcpy
2.73% libc-2.17.so [.] __memcpy_ssse3_back
2.25% [kernel] [k] copy_user_generic_unrolled
1.89% libc-2.17.so [.] _int_malloc
1.53% ovs-vswitchd [.] xlate_actions

With this patch, the TCP throughput(we dont use Megaflow Cache
+ Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
(maybe ~10% performance imporve).

This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
that the flow used. And we only check the flow_stats on the cpu we
used, and it is unncessary to check all possible cpu when getting,
cleaning, and updating the flow_stats. Adding the cpu_used_mask to
sw_flow struct does’t increase the cacheline number.

Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Tonghao Zhang
2017-07-20 04:49:39 +0800
c57c054eb openvswitch: Optimize updating for OvS flow_stats. ... Browse Code »

In the ovs_flow_stats_update(), we only use the node
var to alloc flow_stats struct. But this is not a
common case, it is unnecessary to call the numa_node_id()
everytime. This patch is not a bugfix, but there maybe
a small increase.

Signed-off-by: Tonghao Zhang
Signed-off-by: David S. Miller

Tonghao Zhang
2017-07-20 04:49:39 +0800

18 Jul, 2017

1 commit

880388aa3 net: Remove all references to SKB_GSO_UDP. ... Browse Code »

Such packets are no longer possible.

Signed-off-by: David S. Miller

David S. Miller
2017-07-18 00:52:58 +0800

02 Apr, 2017

1 commit

6f56f6186 openvswitch: Fix ovs_flow_key_update() ... Browse Code »

ovs_flow_key_update() is called when the flow key is invalid, and it is
used to update and revalidate the flow key. Commit 329f45bc4f19
("openvswitch: add mac_proto field to the flow key") introduces mac_proto
field to flow key and use it to determine whether the flow key is valid.
However, the commit does not update the code path in ovs_flow_key_update()
to revalidate the flow key which may cause BUG_ON() on execute_recirc().
This patch addresses the aforementioned issue.

Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key")
Signed-off-by: Yi-Hung Wei
Acked-by: Jiri Benc
Signed-off-by: David S. Miller

Yi-Hung Wei
2017-04-02 03:16:46 +0800

10 Feb, 2017

1 commit

9dd7f8907 openvswitch: Add original direction conntrack tuple to sw_flow_key. ... Browse Code »

Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jarno Rajahalme
2017-02-10 11:59:34 +0800

28 Dec, 2016

1 commit

df30f7408 openvswitch: upcall: Fix vlan handling. ... Browse Code »

Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme
CC: Jiri Benc
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

pravin shelar
2016-12-28 01:28:07 +0800

13 Nov, 2016

2 commits

5108bbadd openvswitch: add processing of L3 packets ... Browse Code »

Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).

Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab
Signed-off-by: Simon Horman
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
329f45bc4 openvswitch: add mac_proto field to the flow key ... Browse Code »

Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.

It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.

Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800

13 Oct, 2016

1 commit

20ecf1e4e openvswitch: vlan: remove wrong likely statement ... Browse Code »

This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.

Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Eric Garver
Signed-off-by: David S. Miller

Jiri Benc
2016-10-13 22:03:23 +0800

03 Oct, 2016

1 commit

f7d49bce8 openvswitch: mpls: set network header correctly on key extract ... Browse Code »

After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.

However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.

Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-10-03 14:00:21 +0800

21 Sep, 2016

1 commit

2279994d0 openvswitch: avoid resetting flow key while installing new flow. ... Browse Code »

since commit commit db74a3335e0f6 ("openvswitch: use percpu
flow stats") flow alloc resets flow-key. So there is no need
to reset the flow-key again if OVS is using newly allocated
flow-key.

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

pravin shelar
2016-09-21 10:54:35 +0800

19 Sep, 2016

2 commits

db74a3335 openvswitch: use percpu flow stats ... Browse Code »

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.

Signed-off-by: Thadeu Lima de Souza Cascardo
Cc: pravin shelar
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-09-19 10:14:01 +0800
40773966c openvswitch: fix flow stats accounting when node 0 is not possible ... Browse Code »

On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.

However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.

Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.

Signed-off-by: Thadeu Lima de Souza Cascardo
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-09-19 10:14:01 +0800

09 Sep, 2016

1 commit

018c1dda5 openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes ... Browse Code »

Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.

This is based on Thomas F Herbert's original v20 patch. I made some
small clean ups and bug fixes.

Signed-off-by: Thomas F Herbert
Signed-off-by: Eric Garver
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Eric Garver
2016-09-09 08:10:28 +0800

07 Oct, 2015

1 commit

00a93babd openvswitch: add tunnel protocol to sw_flow_key ... Browse Code »

Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Jiri Benc
2015-10-07 19:17:59 +0800

01 Sep, 2015

1 commit

4c2227984 ip-tunnel: Use API to access tunnel metadata options. ... Browse Code »

Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.

Signed-off-by: Pravin B Shelar
Acked-by: Thomas Graf
Reviewed-by: Jesse Gross
Signed-off-by: David S. Miller

Pravin B Shelar
2015-09-01 03:28:56 +0800

30 Aug, 2015

3 commits

a581b96db openvswitch: Remove vport-net ... Browse Code »

This structure is not used anymore.

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2015-08-30 10:07:15 +0800
c30da4978 openvswitch: retain parsed IPv6 header fields in flow on error skipping extension headers ... Browse Code »

When an error occurs skipping IPv6 extension headers retain the already
parsed IP protocol and IPv6 addresses in the flow. Also assume that the
packet is not a fragment in the absence of information to the contrary;
that is always use the frag_off value set by ipv6_skip_exthdr().

This allows matching on the IP protocol and IPv6 addresses of packets
with malformed extension headers.

Signed-off-by: Simon Horman
Signed-off-by: David S. Miller

Simon Horman
2015-08-30 04:39:59 +0800
7f9562a1f ip_tunnels: record IP version in tunnel info ... Browse Code »

There's currently nothing preventing directing packets with IPv6
encapsulation data to IPv4 tunnels (and vice versa). If this happens,
IPv6 addresses are incorrectly interpreted as IPv4 ones.

Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
in ip_tunnel_info. Reject packets at appropriate places if they are supposed
to be encapsulated into an incompatible protocol.

Signed-off-by: Jiri Benc
Acked-by: Alexei Starovoitov
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2015-08-30 04:07:54 +0800

28 Aug, 2015

2 commits

c2ac66735 openvswitch: Allow matching on conntrack label ... Browse Code »

Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Signed-off-by: Joe Stringer
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-08-28 02:40:43 +0800
7f8a436ea openvswitch: Add conntrack action ... Browse Code »

Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

Signed-off-by: Joe Stringer
Signed-off-by: Justin Pettit
Signed-off-by: Andy Zhou
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-08-28 02:40:43 +0800

22 Jul, 2015

1 commit

1d8fff907 ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic ... Browse Code »

Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.

Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-07-22 01:39:05 +0800

06 May, 2015

1 commit

6713fc9b8 openvswitch: Use eth_proto_is_802_3 ... Browse Code »

Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto).

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2015-05-06 07:24:42 +0800

15 Apr, 2015

1 commit

4167e9b2c mm: remove GFP_THISNODE ... Browse Code »

NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected. It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
allocator slowpath to fail without trying reclaim even though it may be
used in combination with __GFP_WAIT.

An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE. The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a
no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity. Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.

This allows the aforementioned functions to actually reclaim as they
should. It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.

Signed-off-by: David Rientjes
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Joonsoo Kim
Acked-by: Johannes Weiner
Cc: Mel Gorman
Cc: Pravin Shelar
Cc: Jarno Rajahalme
Cc: Li Zefan
Cc: Greg Thelen
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2015-04-15 07:49:03 +0800

12 Feb, 2015

1 commit

b35725a28 openvswitch: Reset key metadata for packet execution. ... Browse Code »

Userspace packet execute command pass down flow key for given
packet. But userspace can skip some parameter with zero value.
Therefore kernel needs to initialize key metadata to zero.

Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2015-02-12 06:40:15 +0800

15 Jan, 2015

1 commit

d91641d9b openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() ... Browse Code »

Also factors out Geneve validation code into a new separate function
validate_and_copy_geneve_opts().

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 14:11:41 +0800