20 Oct, 2015

1 commit


07 Oct, 2015

1 commit

  • Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
    also acts as an indicator whether the flow contains tunnel data (this was
    previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
    in an union with IPv4 ones this won't work anymore).

    The new field was added to a hole in sw_flow_key.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     

05 Oct, 2015

1 commit

  • When openvswitch tries allocate memory from offline numa node 0:
    stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
    It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
    [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
    This patch disables numa affinity in this case.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

23 Sep, 2015

1 commit

  • When support for megaflows was introduced, OVS needed to start
    installing flows with a mask applied to them. Since masking is an
    expensive operation, OVS also had an optimization that would only
    take the parts of the flow keys that were covered by a non-zero
    mask. The values stored in the remaining pieces should not matter
    because they are masked out.

    While this works fine for the purposes of matching (which must always
    look at the mask), serialization to netlink can be problematic. Since
    the flow and the mask are serialized separately, the uninitialized
    portions of the flow can be encoded with whatever values happen to be
    present.

    In terms of functionality, this has little effect since these fields
    will be masked out by definition. However, it leaks kernel memory to
    userspace, which is a potential security vulnerability. It is also
    possible that other code paths could look at the masked key and get
    uninitialized data, although this does not currently appear to be an
    issue in practice.

    This removes the mask optimization for flows that are being installed.
    This was always intended to be the case as the mask optimizations were
    really targetting per-packet flow operations.

    Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
    Signed-off-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jesse Gross
     

21 Aug, 2015

1 commit


23 Jul, 2015

1 commit


22 Jul, 2015

2 commits

  • Some architectures like POWER can have a NUMA node_possible_map that
    contains sparse entries. This causes memory corruption with openvswitch
    since it allocates flow_cache with a multiple of num_possible_nodes() and
    assumes the node variable returned by for_each_node will index into
    flow->stats[node].

    Use nr_node_ids to allocate a maximal sparse array instead of
    num_possible_nodes().

    The crash was noticed after 3af229f2 was applied as it changed the
    node_possible_map to match node_online_map on boot.
    Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

    Signed-off-by: Chris J Arges
    Acked-by: Pravin B Shelar
    Acked-by: Nishanth Aravamudan
    Signed-off-by: David S. Miller

    Chris J Arges
     
  • Utilize the new metadata dst to attach encapsulation instructions to
    the skb. The existing egress_tun_info via the OVS_CB() is left in
    place until all tunnel vports have been converted to the new method.

    Signed-off-by: Thomas Graf
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     

08 Feb, 2015

1 commit

  • Flow alloc needs to initialize unmasked key pointer. Otherwise
    it can crash kernel trying to free random unmasked-key pointer.

    general protection fault: 0000 [#1] SMP
    3.19.0-rc6-net-next+ #457
    Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008
    RIP: 0010:[] [] kfree+0xac/0x196
    Call Trace:
    [] flow_free+0x21/0x59 [openvswitch]
    [] ovs_flow_free+0x21/0x23 [openvswitch]
    [] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
    [] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
    [] ? nla_parse+0x4f/0xec
    [] genl_family_rcv_msg+0x26d/0x2c9
    [] ? __lock_acquire+0x90e/0x9aa
    [] genl_rcv_msg+0x66/0x89
    [] ? genl_family_rcv_msg+0x2c9/0x2c9
    [] netlink_rcv_skb+0x3e/0x95
    [] ? genl_rcv+0x18/0x37
    [] genl_rcv+0x27/0x37
    [] netlink_unicast+0x103/0x191
    [] netlink_sendmsg+0x2c1/0x310
    [] ? might_fault+0x50/0xa0
    [] do_sock_sendmsg+0x5f/0x7a
    [] sock_sendmsg+0xb/0xd
    [] ___sys_sendmsg+0x1a3/0x218
    [] ? get_close_on_exec+0x86/0x86
    [] ? fsnotify+0x32c/0x348
    [] ? fsnotify+0x7c/0x348
    [] ? __fget+0xaa/0xbf
    [] ? get_close_on_exec+0x86/0x86
    [] __sys_sendmsg+0x3d/0x5e
    [] SyS_sendmsg+0x14/0x16
    [] system_call_fastpath+0x12/0x17

    Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
    CC: Joe Stringer
    Reported-by: Or Gerlitz
    Signed-off-by: Pravin B Shelar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

27 Jan, 2015

3 commits

  • Previously, flows were manipulated by userspace specifying a full,
    unmasked flow key. This adds significant burden onto flow
    serialization/deserialization, particularly when dumping flows.

    This patch adds an alternative way to refer to flows using a
    variable-length "unique flow identifier" (UFID). At flow setup time,
    userspace may specify a UFID for a flow, which is stored with the flow
    and inserted into a separate table for lookup, in addition to the
    standard flow table. Flows created using a UFID must be fetched or
    deleted using the UFID.

    All flow dump operations may now be made more terse with OVS_UFID_F_*
    flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
    omit the flow key from a datapath operation if the flow has a
    corresponding UFID. This significantly reduces the time spent assembling
    and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
    enabled, the datapath only returns the UFID and statistics for each flow
    during flow dump, increasing ovs-vswitchd revalidator performance by 40%
    or more.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • These minor tidyups make a future patch a little tidier.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
    This tidies up a future patch.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

11 Dec, 2014

1 commit

  • This patch effectively reverts commit 500f80872645 ("net: ovs: use CRC32
    accelerated flow hash if available"), and other remaining arch_fast_hash()
    users such as from nfsd via commit 6282cd565553 ("NFSD: Don't hand out
    delegations for 30 seconds after recalling them.") where it has been used
    as a hash function for bloom filtering.

    While we think that these users are actually not much of concern, it has
    been requested to remove the arch_fast_hash() library bits that arose
    from [1] entirely as per recent discussion [2]. The main argument is that
    using it as a hash may introduce bias due to its linearity (see avalanche
    criterion) and thus makes it less clear (though we tried to document that)
    when this security/performance trade-off is actually acceptable for a
    general purpose library function.

    Lets therefore avoid any further confusion on this matter and remove it to
    prevent any future accidental misuse of it. For the time being, this is
    going to make hashing of flow keys a bit more expensive in the ovs case,
    but future work could reevaluate a different hashing discipline.

    [1] https://patchwork.ozlabs.org/patch/299369/
    [2] https://patchwork.ozlabs.org/patch/418756/

    Cc: Neil Brown
    Cc: Francesco Fusco
    Cc: Jesse Gross
    Cc: Thomas Graf
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Nov, 2014

1 commit


06 Nov, 2014

1 commit


01 Jul, 2014

1 commit

  • Due to the race condition in userspace, there is chance that two
    overlapping megaflows could be installed in datapath. And this
    causes userspace unable to delete the less inclusive megaflow flow
    even after it timeout, since the flow_del logic will stop at the
    first match of masked flow.

    This commit fixes the bug by making the kernel flow_del and flow_get
    logic check all masks in that case.

    Introduced by 03f0d916a (openvswitch: Mega flow implementation).

    Signed-off-by: Alex Wang
    Acked-by: Andy Zhou
    Signed-off-by: Pravin B Shelar

    Alex Wang
     

23 May, 2014

2 commits

  • Incorrect struct name was confusing, even though otherwise
    inconsequental.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Masks are inserted when flows are inserted to the table, so it is
    logical to correspondingly remove masks when flows are removed from
    the table, in ovs_flow_table_remove().

    This allows ovs_flow_free() to be called without locking, which will
    be used by later patches.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     

17 May, 2014

3 commits

  • Keep kernel flow stats for each NUMA node rather than each (logical)
    CPU. This avoids using the per-CPU allocator and removes most of the
    kernel-side OVS locking overhead otherwise on the top of perf reports
    and allows OVS to scale better with higher number of threads.

    With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
    rate doubles on a server with two hyper-threaded physical CPUs (16
    logical cores each) compared to the current OVS master. Tested with
    non-trivial flow table with a TCP port match rule forcing all new
    connections with unique port numbers to OVS userspace. The IP
    addresses are still wildcarded, so the kernel flows are not considered
    as exact match 5-tuple flows. This type of flows can be expected to
    appear in large numbers as the result of more effective wildcarding
    made possible by improvements in OVS userspace flow classifier.

    Perf results for this test (master):

    Events: 305K cycles
    + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
    + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
    + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc
    + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
    + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area
    + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
    + 2.03% swapper [kernel.kallsyms] [k] intel_idle
    + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
    + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
    + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6
    + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset
    + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock
    + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock
    ...

    And after this patch:

    Events: 356K cycles
    + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc
    + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
    + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
    + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
    + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
    + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
    + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f
    + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
    + 1.47% swapper [kernel.kallsyms] [k] intel_idle
    + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask
    + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref
    + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash
    + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions
    + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref
    + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock
    ...

    There is a small increase in kernel spinlock overhead due to the same
    spinlock being shared between multiple cores of the same physical CPU,
    but that is barely visible in the netperf TCP_CRR test performance
    (maybe ~1% performance drop, hard to tell exactly due to variance in
    the test results), when testing for kernel module throughput (with no
    userspace activity, handful of kernel flows).

    On flow setup, a single stats instance is allocated (for the NUMA node
    0). As CPUs from multiple NUMA nodes start updating stats, new
    NUMA-node specific stats instances are allocated. This allocation on
    the packet processing code path is made to never block or look for
    emergency memory pools, minimizing the allocation latency. If the
    allocation fails, the existing preallocated stats instance is used.
    Also, if only CPUs from one NUMA-node are updating the preallocated
    stats instance, no additional stats instances are allocated. This
    eliminates the need to pre-allocate stats instances that will not be
    used, also relieving the stats reader from the burden of reading stats
    that are never used.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Jarno Rajahalme
     
  • The 5-tuple optimization becomes unnecessary with a later per-NUMA
    node stats patch. Remove it first to make the changes easier to
    grasp.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Jesse Gross

    Jarno Rajahalme
     
  • In few functions, const formal parameters are assigned or cast to
    non-const.
    These changes suppress warnings if compiled with -Wcast-qual.

    Signed-off-by: Daniele Di Proietto
    Signed-off-by: Jesse Gross

    Daniele Di Proietto
     

05 Feb, 2014

2 commits

  • ovs_flow_free() is not called under ovs-lock during packet
    execute path (ovs_packet_cmd_execute()). Since packet execute
    does not touch flow->mask, there is no need to take that
    lock either. So move assert in case where flow->mask is checked.

    Found by code inspection.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     
  • Both mega flow mask's reference counter and per flow table mask list
    should only be accessed when holding ovs_mutex() lock. However
    this is not true with ovs_flow_table_flush(). The patch fixes this bug.

    Reported-by: Joe Stringer
    Signed-off-by: Andy Zhou
    Signed-off-by: Jesse Gross

    Andy Zhou
     

10 Jan, 2014

1 commit


07 Jan, 2014

7 commits


18 Dec, 2013

1 commit

  • Currently OVS uses jhash2() for calculating flow hashes in its
    internal flow_hash() function. The performance of the flow_hash()
    function is critical, as the input data can be hundreds of bytes
    long.

    OVS is largely deployed in x86_64 based datacenters. Therefore,
    we argue that the performance critical fast path of OVS should
    exploit underlying CPU features in order to reduce the per packet
    processing costs. We replace jhash2 with the hash implementation
    provided by the kernel hash lib, which exploits the crc32l
    instruction to achieve high performance

    Our patch greatly reduces the hash footprint from ~200 cycles of
    jhash2() to around ~90 cycles in case of ovs_flow_hash_crc()
    (measured with rdtsc over maximum length flow keys on an i7 Intel
    CPU).

    Additionally, we wrote a microbenchmark to stress the flow table
    performance. The benchmark inserts random flows into the flow
    hash and then performs lookups. Our hash deployed on a CRC32
    capable CPU reduces the lookup for 1000 flows, 100 masks from
    ~10,100us to ~6,700us, for example.

    Thus, simply use the newly introduced arch_fast_hash2() as a
    drop-in replacement.

    Signed-off-by: Francesco Fusco
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Thomas Graf
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Francesco Fusco
     

02 Nov, 2013

1 commit


23 Oct, 2013

1 commit


04 Oct, 2013

3 commits