Eric Lee / smarc-fsl-linux-kernel

13 Mar, 2019

1 commit

ee9c5e675 openvswitch: convert to kvmalloc ... Browse Code »

Patch series "generic radix trees; drop flex arrays".

This patch (of 7):

There was no real need for this code to be using flexarrays, it's just
implementing a hash table - ideally it would be using rhashtables, but
that conversion would be significantly more complicated.

Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet
Reviewed-by: Matthew Wilcox
Cc: Pravin B Shelar
Cc: Alexey Dobriyan
Cc: Al Viro
Cc: Dave Hansen
Cc: Eric Paris
Cc: Marcelo Ricardo Leitner
Cc: Neil Horman
Cc: Paul Moore
Cc: Shaohua Li
Cc: Stephen Smalley
Cc: Vlad Yasevich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2019-03-13 01:04:02 +0800

09 Nov, 2018

1 commit

9df46aefa OVS: remove use of VLAN_TAG_PRESENT ... Browse Code »

This is a minimal change to allow removing of VLAN_TAG_PRESENT.
It leaves OVS unable to use CFI bit, as fixing this would need
a deeper surgery involving userspace interface.

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2018-11-09 11:49:31 +0800

08 Nov, 2017

1 commit

b2d0f5d5d openvswitch: enable NSH support ... Browse Code »

v16->17
- Fixed disputed check code: keep them in nsh_push and nsh_pop
but also add them in __ovs_nla_copy_actions

v15->v16
- Add csum recalculation for nsh_push, nsh_pop and set_nsh
pointed out by Pravin
- Move nsh key into the union with ipv4 and ipv6 and add
check for nsh key in match_validate pointed out by Pravin
- Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
- Check size in nsh_hdr_from_nlattr
- Fixed four small issues pointed out By Jiri and Eric

v13->v14
- Rename skb_push_nsh to nsh_push per Dave's comment
- Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
- Fix NSH header length check in set_nsh

v11->v12
- Fix missing changes old comments pointed out
- Fix new comments for v11

v10->v11
- Fix the left three disputable comments for v9
but not fixed in v10.

v9->v10
- Change struct ovs_key_nsh to
struct ovs_nsh_key_base base;
__be32 context[NSH_MD1_CONTEXT_SIZE];
- Fix new comments for v9

v8->v9
- Fix build error reported by daily intel build
because nsh module isn't selected by openvswitch

v7->v8
- Rework nested value and mask for OVS_KEY_ATTR_NSH
- Change pop_nsh to adapt to nsh kernel module
- Fix many issues per comments from Jiri Benc

v6->v7
- Remove NSH GSO patches in v6 because Jiri Benc
reworked it as another patch series and they have
been merged.
- Change it to adapt to nsh kernel module added by NSH
GSO patch series

v5->v6
- Fix the rest comments for v4.
- Add NSH GSO support for VxLAN-gpe + NSH and
Eth + NSH.

v4->v5
- Fix many comments by Jiri Benc and Eric Garver
for v4.

v3->v4
- Add new NSH match field ttl
- Update NSH header to the latest format
which will be final format and won't change
per its author's confirmation.
- Fix comments for v3.

v2->v3
- Change OVS_KEY_ATTR_NSH to nested key to handle
length-fixed attributes and length-variable
attriubte more flexibly.
- Remove struct ovs_action_push_nsh completely
- Add code to handle nested attribute for SET_MASKED
- Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
to transfer NSH header data.
- Fix comments and coding style issues by Jiri and Eric

v1->v2
- Change encap_nsh and decap_nsh to push_nsh and pop_nsh
- Dynamically allocate struct ovs_action_push_nsh for
length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang
Acked-by: Jiri Benc
Acked-by: Eric Garver
Acked-by: Pravin Shelar
Signed-off-by: David S. Miller

Yi Yang
2017-11-08 15:12:33 +0800

20 Jul, 2017

1 commit

c4b2bf6b4 openvswitch: Optimize operations for OvS flow_stats. ... Browse Code »

When calling the flow_free() to free the flow, we call many times
(cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
take up our CPU usage if we call the flow_free() frequently.
When we put all packets to userspace via upcall, and OvS will send
them back via netlink to ovs_packet_cmd_execute(will call flow_free).

The test topo is shown as below. VM01 sends TCP packets to VM02,
and OvS forward packtets. When testing, we use perf to report the
system performance.

VM01 --- OvS-VM --- VM02

Without this patch, perf-top show as below: The flow_free() is
3.02% CPU usage.

4.23% [kernel] [k] _raw_spin_unlock_irqrestore
3.62% [kernel] [k] __do_softirq
3.16% [kernel] [k] __memcpy
3.02% [kernel] [k] flow_free
2.42% libc-2.17.so [.] __memcpy_ssse3_back
2.18% [kernel] [k] copy_user_generic_unrolled
2.17% [kernel] [k] find_next_bit

When applied this patch, perf-top show as below: Not shown on
the list anymore.

4.11% [kernel] [k] _raw_spin_unlock_irqrestore
3.79% [kernel] [k] __do_softirq
3.46% [kernel] [k] __memcpy
2.73% libc-2.17.so [.] __memcpy_ssse3_back
2.25% [kernel] [k] copy_user_generic_unrolled
1.89% libc-2.17.so [.] _int_malloc
1.53% ovs-vswitchd [.] xlate_actions

With this patch, the TCP throughput(we dont use Megaflow Cache
+ Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
(maybe ~10% performance imporve).

This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
that the flow used. And we only check the flow_stats on the cpu we
used, and it is unncessary to check all possible cpu when getting,
cleaning, and updating the flow_stats. Adding the cpu_used_mask to
sw_flow struct does’t increase the cacheline number.

Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Tonghao Zhang
2017-07-20 04:49:39 +0800

10 Feb, 2017

2 commits

316d4d78c openvswitch: Pack struct sw_flow_key. ... Browse Code »

struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there. In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jarno Rajahalme
2017-02-10 11:59:34 +0800
9dd7f8907 openvswitch: Add original direction conntrack tuple to sw_flow_key. ... Browse Code »

Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jarno Rajahalme
2017-02-10 11:59:34 +0800

13 Nov, 2016

1 commit

329f45bc4 openvswitch: add mac_proto field to the flow key ... Browse Code »

Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.

It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.

Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800

19 Sep, 2016

1 commit

db74a3335 openvswitch: use percpu flow stats ... Browse Code »

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.

Signed-off-by: Thadeu Lima de Souza Cascardo
Cc: pravin shelar
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-09-19 10:14:01 +0800

09 Sep, 2016

1 commit

018c1dda5 openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes ... Browse Code »

Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.

This is based on Thomas F Herbert's original v20 patch. I made some
small clean ups and bug fixes.

Signed-off-by: Thomas F Herbert
Signed-off-by: Eric Garver
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Eric Garver
2016-09-09 08:10:28 +0800

19 Mar, 2016

1 commit

fca5fdf67 ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it ... Browse Code »

eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
for this, which makes the code a bit more generic and allows to remove
BPF_TUNLEN_MAX from eBPF code.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2016-03-19 07:38:46 +0800

20 Oct, 2015

1 commit

26440c835 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/usb/asix_common.c
net/ipv4/inet_connection_sock.c
net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2015-10-20 21:08:27 +0800

07 Oct, 2015

1 commit

00a93babd openvswitch: add tunnel protocol to sw_flow_key ... Browse Code »

Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Jiri Benc
2015-10-07 19:17:59 +0800

05 Oct, 2015

1 commit

33db4125e openvswitch: Rename LABEL->LABELS ... Browse Code »

Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
for these to be consistent with conntrack.

Fixes: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-10-05 21:34:28 +0800

28 Aug, 2015

4 commits

c2ac66735 openvswitch: Allow matching on conntrack label ... Browse Code »

Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Signed-off-by: Joe Stringer
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-08-28 02:40:43 +0800
182e3042e openvswitch: Allow matching on conntrack mark ... Browse Code »

Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.

Signed-off-by: Justin Pettit
Signed-off-by: Joe Stringer
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-08-28 02:40:43 +0800
7f8a436ea openvswitch: Add conntrack action ... Browse Code »

Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

Signed-off-by: Joe Stringer
Signed-off-by: Justin Pettit
Signed-off-by: Andy Zhou
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-08-28 02:40:43 +0800
8e2fed1c0 openvswitch: Serialize acts with original netlink len ... Browse Code »

Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Joe Stringer
2015-08-28 02:40:42 +0800

22 Jul, 2015

2 commits

34ae932a4 openvswitch: Make tunnel set action attach a metadata dst ... Browse Code »

Utilize the new metadata dst to attach encapsulation instructions to
the skb. The existing egress_tun_info via the OVS_CB() is left in
place until all tunnel vports have been converted to the new method.

Signed-off-by: Thomas Graf
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thomas Graf
2015-07-22 01:39:06 +0800
1d8fff907 ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic ... Browse Code »

Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.

Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-07-22 01:39:05 +0800

27 Jan, 2015

1 commit

74ed7ab92 openvswitch: Add support for unique flow IDs. ... Browse Code »

Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-01-27 07:45:50 +0800

15 Jan, 2015

1 commit

d91641d9b openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() ... Browse Code »

Also factors out Geneve validation code into a new separate function
validate_and_copy_geneve_opts().

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 14:11:41 +0800

10 Nov, 2014

3 commits

05da5898a openvswitch: Add support for OVS_FLOW_ATTR_PROBE. ... Browse Code »

This new flag is useful for suppressing error logging while probing
for datapath features using flow commands. For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Pravin B Shelar

Jarno Rajahalme
2014-11-10 10:58:44 +0800
12eb18f71 openvswitch: Constify various function arguments ... Browse Code »

Help produce better optimized code.

Signed-off-by: Thomas Graf
Signed-off-by: Pravin B Shelar

Thomas Graf
2014-11-10 10:58:44 +0800
8f0aad6f3 openvswitch: Extend packet attribute for egress tunnel info ... Browse Code »

OVS vswitch has extended IPFIX exporter to export tunnel headers
to improve network visibility.
To export this information userspace needs to know egress tunnel
for given packet. By extending packet attributes datapath can
export egress tunnel info for given packet. So that userspace
can ask for egress tunnel info in userspace action. This
information is used to build IPFIX data for given flow.

Signed-off-by: Wenyu Zhang
Acked-by: Romain Lenglet
Acked-by: Ben Pfaff
Signed-off-by: Pravin B Shelar

Wenyu Zhang
2014-11-10 10:58:44 +0800

06 Nov, 2014

1 commit

25cd9ba0a openvswitch: Add basic MPLS support to kernel ... Browse Code »

Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K
Cc: Leo Alterman
Cc: Isaku Yamahata
Cc: Joe Stringer
Signed-off-by: Simon Horman
Signed-off-by: Jesse Gross
Signed-off-by: Pravin B Shelar

Simon Horman
2014-11-06 15:52:33 +0800

06 Oct, 2014

2 commits

f57966840 openvswitch: Add support for Geneve tunneling. ... Browse Code »

The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.

Signed-off-by: Jesse Gross
Singed-off-by: Ansis Atteka
Signed-off-by: Andy Zhou
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jesse Gross
2014-10-06 12:32:21 +0800
f0b128c1e openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure. ... Browse Code »

Currently, the flow information that is matched for tunnels and
the tunnel data passed around with packets is the same. However,
as additional information is added this is not necessarily desirable,
as in the case of pointers.

This adds a new structure for tunnel metadata which currently contains
only the existing struct. This change is purely internal to the kernel
since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version
of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.

Signed-off-by: Jesse Gross
Signed-off-by: Andy Zhou
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jesse Gross
2014-10-06 12:32:20 +0800

16 Sep, 2014

3 commits

971427f35 openvswitch: Add recirc and hash action. ... Browse Code »

Recirc action allows a packet to reenter openvswitch processing.
currently openvswitch lookup flow for packet received and execute
set of actions on that packet, with help of recirc action we can
process/modify the packet and recirculate it back in openvswitch
for another pass.

OVS hash action calculates 5-tupple hash and set hash in flow-key
hash. This can be used along with recirculation for distributing
packets among different ports for bond devices.
For example:
OVS bonding can use following actions:
Match on: bond flow; Action: hash, recirc(id)
Match on: recirc-id == id and hash lower bits == a;
Action: output port_bond_a

Signed-off-by: Andy Zhou
Acked-by: Jesse Gross
Signed-off-by: Pravin B Shelar

Andy Zhou
2014-09-16 14:28:14 +0800
8c8b1b83f openvswitch: Use tun_key only for egress tunnel path. ... Browse Code »

Currently tun_key is used for passing tunnel information
on ingress and egress path, this cause confusion. Following
patch removes its use on ingress path make it egress only parameter.

Signed-off-by: Pravin B Shelar
Acked-by: Andy Zhou

Pravin B Shelar
2014-09-16 14:28:13 +0800
83c8df26a openvswitch: refactor ovs flow extract API. ... Browse Code »

OVS flow extract is called on packet receive or packet
execute code path. Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar
Acked-by: Andy Zhou

Pravin B Shelar
2014-09-16 14:28:13 +0800

30 Jun, 2014

1 commit

ad5520073 openvswitch: Fix tracking of flags seen in TCP flows. ... Browse Code »

Flow statistics need to take into account the TCP flags from the packet
currently being processed (in 'key'), not the TCP flags matched by the
flow found in the kernel flow table (in 'flow').

This bug made the Open vSwitch userspace fin_timeout action have no effect
in many cases.
This bug is introduced by commit 88d73f6c411ac2f0578 (openvswitch: Use
TCP flags in the flow key for stats.)

Reported-by: Len Gao
Signed-off-by: Ben Pfaff
Acked-by: Jarno Rajahalme
Acked-by: Jesse Gross
Signed-off-by: Pravin B Shelar

Ben Pfaff
2014-06-30 05:10:51 +0800

23 May, 2014

2 commits

86ec8dbae openvswitch: Fix ovs_flow_stats_get/clear RCU dereference. ... Browse Code »

For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Pravin B Shelar

Jarno Rajahalme
2014-05-23 07:27:35 +0800
1139e241e openvswitch: Compact sw_flow_key. ... Browse Code »

Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes). These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
always
64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
every
second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
sw_flow_key,
it is in stack (on tunnel input functions), where compiler has full
control of the alignment.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Pravin B Shelar

Jarno Rajahalme
2014-05-23 07:27:34 +0800

17 May, 2014

2 commits

63e7959c4 openvswitch: Per NUMA node flow stats. ... Browse Code »

Keep kernel flow stats for each NUMA node rather than each (logical)
CPU. This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master. Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace. The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows. This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+ 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area
+ 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.03% swapper [kernel.kallsyms] [k] intel_idle
+ 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6
+ 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset
+ 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock
+ 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+ 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f
+ 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 1.47% swapper [kernel.kallsyms] [k] intel_idle
+ 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask
+ 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref
+ 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash
+ 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions
+ 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref
+ 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0). As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated. This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency. If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated. This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.

Signed-off-by: Jarno Rajahalme
Acked-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Jarno Rajahalme
2014-05-17 04:40:29 +0800
23dabf88a openvswitch: Remove 5-tuple optimization. ... Browse Code »

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch. Remove it first to make the changes easier to
grasp.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Jesse Gross

Jarno Rajahalme
2014-05-17 04:40:29 +0800

07 Jan, 2014

2 commits

e298e5057 openvswitch: Per cpu flow stats. ... Browse Code »

With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Pravin B Shelar
2014-01-07 07:52:24 +0800
8f49ce113 openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit). ... Browse Code »

We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.

This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.

Compile tested only.

Signed-off-by: Ben Pfaff
Acked-by: Andy Zhou
Signed-off-by: Jesse Gross

Ben Pfaff
2014-01-07 07:51:27 +0800

02 Nov, 2013

2 commits

5eb26b156 openvswitch: TCP flags matching support. ... Browse Code »

tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.

TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:

0: FIN No more data from sender.

1: SYN Synchronize sequence numbers.

2: RST Reset the connection.

3: PSH Push function.

4: ACK Acknowledgement field significant.

5: URG Urgent pointer field significant.

6: ECE ECN Echo.

7: CWR Congestion Windows Reduced.

8: NS Nonce Sum.

9-11: Reserved.

12-15: Not matchable, must be zero.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Jesse Gross

Jarno Rajahalme
2013-11-02 09:43:45 +0800
df23e9f64 openvswitch: Widen TCP flags handling. ... Browse Code »

Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Jesse Gross

Jarno Rajahalme
2013-11-02 09:43:45 +0800

04 Oct, 2013

1 commit

e64457191 openvswitch: Restructure datapath.c and flow.c ... Browse Code »

Over the time datapath.c and flow.c has became pretty large files.
Following patch restructures functionality of component into three
different components:

flow.c: contains flow extract.
flow_netlink.c: netlink flow api.
flow_table.c: flow table api.

This patch restructures code without changing logic.

Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Pravin B Shelar
2013-10-04 09:16:47 +0800