23 Aug, 2014

1 commit

  • When there are multiple vlan headers present in a received frame, the first
    one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
    skb beyond the VLAN TPID may be still non-linear, including the inner TCI
    and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
    does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
    executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.

    This leads to two things:

    1. Part of the resulting ethernet header is in the non-linear part of the
    skb. When eth_type_trans is called later as the result of
    OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
    is in fact accessing random data when it reads past the TPID.

    2. network_header points into the ethernet header instead of behind it.
    mac_len is set to a wrong value (10), too.

    Reported-by: Yulong Pei
    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

14 Aug, 2014

1 commit


08 Aug, 2014

1 commit


31 Jul, 2014

1 commit


30 Jul, 2014

1 commit

  • This patch introduces the use of the macro IS_ERR_OR_NULL in place of
    tests for NULL and IS_ERR.

    The following Coccinelle semantic patch was used for making the change:

    @@
    expression e;
    @@

    - e == NULL || IS_ERR(e)
    + IS_ERR_OR_NULL(e)
    || ...

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Himangi Saraogi
     

25 Jul, 2014

3 commits

  • Fix a bug where skb_clone() NULL check is missing in sample action
    implementation.

    Signed-off-by: Andy Zhou
    Signed-off-by: Pravin B Shelar

    Andy Zhou
     
  • The sample action is rather generic, allowing arbitrary actions to be
    executed based on a probability. However its use, within the Open
    vSwitch
    code-base is limited: only a single user-space action is ever nested.

    A consequence of the current implementation of sample actions is that
    depending on weather the sample action executed (due to its probability)
    any side-effects of nested actions may or may not be present before
    executing subsequent actions. This has the potential to complicate
    verification of valid actions by the (kernel) datapath. And indeed
    adding support for push and pop MPLS actions inside sample actions
    is one case where such case.

    In order to allow all supported actions to be continue to be nested
    inside sample actions without the potential need for complex
    verification code this patch changes the implementation of the sample
    action in the kernel datapath so that sample actions are more like
    a function call and any side effects of nested actions are not
    present when executing subsequent actions.

    With the above in mind the motivation for this change is twofold:

    * To contain side-effects the sample action in the hope of making it
    easier to deal with in the future and;
    * To avoid some rather complex verification code introduced in the MPLS
    datapath patch.

    Signed-off-by: Simon Horman
    Signed-off-by: Jesse Gross
    Signed-off-by: Pravin B Shelar

    Simon Horman
     
  • In queue_userspace_packet(), the ovs_nla_put_flow return value is
    not checked. This is fine as long as key_attr_size() returns the
    correct value. In case it does not, the current code may corrupt buffer
    memory. Add a run time assertion catch this case to avoid silent
    failure.

    Reported-by: Ben Pfaff
    Signed-off-by: Andy Zhou
    Signed-off-by: Pravin B Shelar

    Andy Zhou
     

24 Jul, 2014

2 commits


17 Jul, 2014

2 commits


16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

08 Jul, 2014

1 commit

  • In vxlan and OVS vport-vxlan call common function to get source port
    for a UDP tunnel. Removed vxlan_src_port since the functionality is
    now in udp_flow_src_port.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

02 Jul, 2014

1 commit


01 Jul, 2014

1 commit

  • Due to the race condition in userspace, there is chance that two
    overlapping megaflows could be installed in datapath. And this
    causes userspace unable to delete the less inclusive megaflow flow
    even after it timeout, since the flow_del logic will stop at the
    first match of masked flow.

    This commit fixes the bug by making the kernel flow_del and flow_get
    logic check all masks in that case.

    Introduced by 03f0d916a (openvswitch: Mega flow implementation).

    Signed-off-by: Alex Wang
    Acked-by: Andy Zhou
    Signed-off-by: Pravin B Shelar

    Alex Wang
     

30 Jun, 2014

3 commits

  • Flow statistics need to take into account the TCP flags from the packet
    currently being processed (in 'key'), not the TCP flags matched by the
    flow found in the kernel flow table (in 'flow').

    This bug made the Open vSwitch userspace fin_timeout action have no effect
    in many cases.
    This bug is introduced by commit 88d73f6c411ac2f0578 (openvswitch: Use
    TCP flags in the flow key for stats.)

    Reported-by: Len Gao
    Signed-off-by: Ben Pfaff
    Acked-by: Jarno Rajahalme
    Acked-by: Jesse Gross
    Signed-off-by: Pravin B Shelar

    Ben Pfaff
     
  • When use gre vport, openvswitch register a gre_cisco_protocol but
    does not supply a err_handler with it. The gre_cisco_err() in
    net/ipv4/gre_demux.c expect err_handler be provided with the
    gre_cisco_protocol implementation, and call ->err_handler() without
    existence check, cause the kernel crash.

    This patch provide a err_handler to fix this bug.
    This bug introduced by commit aa310701e787087d (openvswitch: Add gre
    tunnel support.)

    Signed-off-by: Wei Zhang
    Signed-off-by: Jesse Gross
    Signed-off-by: Pravin B Shelar

    Wei Zhang
     
  • When sample action returns with an error, the skb has already been
    freed. This patch fix a bug to make sure we don't free it again.
    This bug introduced by commit ccb1352e76cff05 (net: Add Open vSwitch
    kernel components.)

    Signed-off-by: Andy Zhou
    Signed-off-by: Pravin B Shelar

    Andy Zhou
     

05 Jun, 2014

1 commit


23 May, 2014

13 commits

  • Following patch get rid of struct genl_family_and_ops which is
    redundant due to changes to struct genl_family.

    Signed-off-by: Kyle Mestery
    Acked-by: Kyle Mestery
    Signed-off-by: Pravin B Shelar

    Pravin B Shelar
     
  • Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Following patch will be easier to reason about with separate
    ovs_flow_cmd_new() and ovs_flow_cmd_set() functions.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • ovs_flow_cmd_del() now allocates reply (if needed) after the flow has
    already been removed from the flow table. If the reply allocation
    fails, a netlink error is signaled with netlink_set_err(), as is
    already done in ovs_flow_cmd_new_or_set() in the similar situation.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Reduce and clarify locking requirements for ovs_flow_cmd_alloc_info(),
    ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info().

    A datapath pointer is available only when holding a lock. Change
    ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info() to take a
    dp_ifindex directly, rather than a datapath pointer that is then
    (only) used to get the dp_ifindex. This is useful, since the
    dp_ifindex is available even when the datapath pointer is not, both
    before and after taking a lock, which makes further critical section
    reduction possible.

    Make ovs_flow_cmd_alloc_info() take an 'acts' argument instead a
    'flow' pointer. This allows some future patches to do the allocation
    before acquiring the flow pointer.

    The locking requirements after this patch are:

    ovs_flow_cmd_alloc_info(): May be called without locking, must not be
    called while holding the RCU read lock (due to memory allocation).
    If 'acts' belong to a flow in the flow table, however, then the
    caller must hold ovs_mutex.

    ovs_flow_cmd_fill_info(): Either ovs_mutex or RCU read lock must be held.

    ovs_flow_cmd_build_info(): This calls both of the above, so the caller
    must hold ovs_mutex.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
    flow dumps call this with RCU read lock.

    ovs_flow_stats_clear() is always called with ovs_mutex, so can use
    ovsl_dereference().

    Also, make the ovs_flow_stats_get() 'flow' argument const to make
    later patches cleaner.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Incorrect struct name was confusing, even though otherwise
    inconsequental.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Move most memory allocations away from the ovs_mutex critical
    sections. vport allocations still happen while the lock is taken, as
    changing that would require major refactoring. Also, vports are
    created very rarely so it should not matter.

    Change ovs_dp_cmd_get() now only takes the rcu_read_lock(), rather
    than ovs_lock(), as nothing need to be changed. This was done by
    ovs_vport_cmd_get() already.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Masks are inserted when flows are inserted to the table, so it is
    logical to correspondingly remove masks when flows are removed from
    the table, in ovs_flow_table_remove().

    This allows ovs_flow_free() to be called without locking, which will
    be used by later patches.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Use netlink_has_listeners() and NLM_F_ECHO flag to determine if a
    reply is needed or not for OVS_FLOW_CMD_NEW, OVS_FLOW_CMD_SET, or
    OVS_FLOW_CMD_DEL. Currently, OVS userspace does not request a reply
    for OVS_FLOW_CMD_NEW, but usually does for OVS_FLOW_CMD_DEL, as stats
    may have changed.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Remove unnecessary locking from functions that are always called with
    appropriate locking.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Thomas Graf
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Flow SET can accept an empty set of actions, with the intended
    semantics of leaving existing actions unmodified. This seems to have
    been brokin after OVS 1.7, as we have assigned the flow's actions
    pointer to NULL in this case, but we never check for the NULL pointer
    later on. This patch restores the intended behavior and documents it
    in the include/linux/openvswitch.h.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     
  • Minimize padding in sw_flow_key and move 'tp' top the main struct.
    These changes simplify code when accessing the transport port numbers
    and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
    systems (128->120 bytes). These changes also make the keys for IPv4
    packets to fit in one cache line.

    There is a valid concern for safety of packing the struct
    ovs_key_ipv4_tunnel, as it would be possible to take the address of
    the tun_id member as a __be64 * which could result in unaligned access
    in some systems. However:

    - sw_flow_key itself is 64-bit aligned, so the tun_id within is
    always
    64-bit aligned.
    - We never make arrays of ovs_key_ipv4_tunnel (which would force
    every
    second tun_key to be misaligned).
    - We never take the address of the tun_id in to a __be64 *.
    - Whereever we use struct ovs_key_ipv4_tunnel outside the
    sw_flow_key,
    it is in stack (on tunnel input functions), where compiler has full
    control of the alignment.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pravin B Shelar

    Jarno Rajahalme
     

17 May, 2014

7 commits

  • This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

    The rcu_assign_pointer() ensures that the initialization of a structure
    is carried out before storing a pointer to that structure.
    And in the case of the NULL pointer, there is no structure to initialize.
    So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

    Signed-off-by: Monam Agarwal
    Signed-off-by: Jesse Gross

    Monam Agarwal
     
  • We already extract the TCP flags for the key, might as well use that
    for stats.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Jarno Rajahalme
     
  • The 'output' argument of the ovs_nla_put_flow() is the one from which
    the bits are written to the netlink attributes. For SCTP we
    accidentally used the bits from the 'swkey' instead. This caused the
    mask attributes to include the bits from the actual flow key instead
    of the mask.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Jarno Rajahalme
     
  • Keep kernel flow stats for each NUMA node rather than each (logical)
    CPU. This avoids using the per-CPU allocator and removes most of the
    kernel-side OVS locking overhead otherwise on the top of perf reports
    and allows OVS to scale better with higher number of threads.

    With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
    rate doubles on a server with two hyper-threaded physical CPUs (16
    logical cores each) compared to the current OVS master. Tested with
    non-trivial flow table with a TCP port match rule forcing all new
    connections with unique port numbers to OVS userspace. The IP
    addresses are still wildcarded, so the kernel flows are not considered
    as exact match 5-tuple flows. This type of flows can be expected to
    appear in large numbers as the result of more effective wildcarding
    made possible by improvements in OVS userspace flow classifier.

    Perf results for this test (master):

    Events: 305K cycles
    + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
    + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
    + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc
    + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
    + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area
    + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
    + 2.03% swapper [kernel.kallsyms] [k] intel_idle
    + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
    + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
    + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6
    + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset
    + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock
    + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock
    ...

    And after this patch:

    Events: 356K cycles
    + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc
    + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock
    + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock
    + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range
    + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock
    + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup
    + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f
    + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner
    + 1.47% swapper [kernel.kallsyms] [k] intel_idle
    + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask
    + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref
    + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash
    + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions
    + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref
    + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock
    ...

    There is a small increase in kernel spinlock overhead due to the same
    spinlock being shared between multiple cores of the same physical CPU,
    but that is barely visible in the netperf TCP_CRR test performance
    (maybe ~1% performance drop, hard to tell exactly due to variance in
    the test results), when testing for kernel module throughput (with no
    userspace activity, handful of kernel flows).

    On flow setup, a single stats instance is allocated (for the NUMA node
    0). As CPUs from multiple NUMA nodes start updating stats, new
    NUMA-node specific stats instances are allocated. This allocation on
    the packet processing code path is made to never block or look for
    emergency memory pools, minimizing the allocation latency. If the
    allocation fails, the existing preallocated stats instance is used.
    Also, if only CPUs from one NUMA-node are updating the preallocated
    stats instance, no additional stats instances are allocated. This
    eliminates the need to pre-allocate stats instances that will not be
    used, also relieving the stats reader from the burden of reading stats
    that are never used.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Jarno Rajahalme
     
  • The 5-tuple optimization becomes unnecessary with a later per-NUMA
    node stats patch. Remove it first to make the changes easier to
    grasp.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Jesse Gross

    Jarno Rajahalme
     
  • It's slightly smaller/faster for some architectures.

    Signed-off-by: Joe Perches
    Signed-off-by: Jesse Gross

    Joe Perches
     
  • Add "openvswitch: " prefix to OVS_NLERR output
    to match the other OVS_NLERR output of datapath.c

    Signed-off-by: Joe Perches
    Signed-off-by: Jesse Gross

    Joe Perches