Eric Lee / smarc-fsl-linux-kernel

23 Sep, 2015

1 commit

ae5f2fb1d openvswitch: Zero flows on allocation. ... Browse Code »

When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.

While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.

In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.

This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.

Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jesse Gross
2015-09-23 08:33:41 +0800

27 Jan, 2015

1 commit

74ed7ab92 openvswitch: Add support for unique flow IDs. ... Browse Code »

Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-01-27 07:45:50 +0800

10 Nov, 2014

1 commit

12eb18f71 openvswitch: Constify various function arguments ... Browse Code »

Help produce better optimized code.

Signed-off-by: Thomas Graf
Signed-off-by: Pravin B Shelar

Thomas Graf
2014-11-10 10:58:44 +0800

06 Nov, 2014

1 commit

9b996e544 openvswitch: Move table destroy to dp-rcu callback. ... Browse Code »

Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.

Signed-off-by: Pravin B Shelar
Acked-by: Thomas Graf

Pravin B Shelar
2014-11-06 15:52:34 +0800

01 Jul, 2014

1 commit

4a46b24e1 openvswitch: Use exact lookup for flow_get and flow_del. ... Browse Code »

Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath. And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Introduced by 03f0d916a (openvswitch: Mega flow implementation).

Signed-off-by: Alex Wang
Acked-by: Andy Zhou
Signed-off-by: Pravin B Shelar

Alex Wang
2014-07-01 11:47:15 +0800

17 May, 2014

2 commits

63e7959c4 openvswitch: Per NUMA node flow stats. ... Browse Code »

Keep kernel flow stats for each NUMA node rather than each (logical)
CPU. This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master. Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace. The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows. This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+ 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area
+ 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.03% swapper [kernel.kallsyms] [k] intel_idle
+ 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6
+ 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset
+ 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock
+ 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+ 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc
+ 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
+ 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
+ 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
+ 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
+ 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
+ 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f
+ 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
+ 1.47% swapper [kernel.kallsyms] [k] intel_idle
+ 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask
+ 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref
+ 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash
+ 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions
+ 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref
+ 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0). As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated. This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency. If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated. This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.

Signed-off-by: Jarno Rajahalme
Acked-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Jarno Rajahalme
2014-05-17 04:40:29 +0800
23dabf88a openvswitch: Remove 5-tuple optimization. ... Browse Code »

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch. Remove it first to make the changes easier to
grasp.

Signed-off-by: Jarno Rajahalme
Signed-off-by: Jesse Gross

Jarno Rajahalme
2014-05-17 04:40:29 +0800

05 Feb, 2014

1 commit

e80857cce openvswitch: Fix kernel panic on ovs_flow_free ... Browse Code »

Both mega flow mask's reference counter and per flow table mask list
should only be accessed when holding ovs_mutex() lock. However
this is not true with ovs_flow_table_flush(). The patch fixes this bug.

Reported-by: Joe Stringer
Signed-off-by: Andy Zhou
Signed-off-by: Jesse Gross

Andy Zhou
2014-02-05 14:21:17 +0800

07 Jan, 2014

2 commits

e298e5057 openvswitch: Per cpu flow stats. ... Browse Code »

With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Pravin B Shelar
2014-01-07 07:52:24 +0800
5bb506324 openvswitch: Change ovs_flow_tbl_lookup_xx() APIs ... Browse Code »

API changes only for code readability. No functional chnages.

This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.

Reported by: Ben Pfaff
Reviewed-by: Thomas Graf
Signed-off-by: Andy Zhou
Signed-off-by: Jesse Gross

Andy Zhou
2014-01-07 07:51:41 +0800

23 Oct, 2013

1 commit

1bd7116f1 openvswitch: collect mega flow mask stats ... Browse Code »

Collect mega flow mask stats. ovs-dpctl show command can be used to
display them for debugging and performance tuning.

Signed-off-by: Andy Zhou
Signed-off-by: Jesse Gross

Andy Zhou
2013-10-23 01:42:46 +0800

04 Oct, 2013

3 commits

618ed0c80 openvswitch: Simplify mega-flow APIs. ... Browse Code »

Hides mega-flow implementation in flow_table.c rather than
datapath.c.

Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Pravin B Shelar
2013-10-04 15:18:30 +0800
b637e4988 openvswitch: Move mega-flow list out of rehashing struct. ... Browse Code »

ovs-flow rehash does not touch mega flow list. Following patch
moves it dp struct datapath. Avoid one extra indirection for
accessing mega-flow list head on every packet receive.

Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Pravin B Shelar
2013-10-04 15:18:26 +0800
e64457191 openvswitch: Restructure datapath.c and flow.c ... Browse Code »

Over the time datapath.c and flow.c has became pretty large files.
Following patch restructures functionality of component into three
different components:

flow.c: contains flow extract.
flow_netlink.c: netlink flow api.
flow_table.c: flow table api.

This patch restructures code without changing logic.

Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross

Pravin B Shelar
2013-10-04 09:16:47 +0800