09 Nov, 2018

1 commit


08 Jul, 2018

1 commit

  • Add 'clone' action to kernel datapath by using existing functions.
    When actions within clone don't modify the current flow, the flow
    key is not cloned before executing clone actions.

    This is a follow up patch for this incomplete work:
    https://patchwork.ozlabs.org/patch/722096/

    v1 -> v2:
    Refactor as advised by reviewer.

    Signed-off-by: Yifeng Sun
    Signed-off-by: Andy Zhou
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yifeng Sun
     

13 Nov, 2017

1 commit


08 Nov, 2017

1 commit

  • v16->17
    - Fixed disputed check code: keep them in nsh_push and nsh_pop
    but also add them in __ovs_nla_copy_actions

    v15->v16
    - Add csum recalculation for nsh_push, nsh_pop and set_nsh
    pointed out by Pravin
    - Move nsh key into the union with ipv4 and ipv6 and add
    check for nsh key in match_validate pointed out by Pravin
    - Add nsh check in validate_set and __ovs_nla_copy_actions

    v14->v15
    - Check size in nsh_hdr_from_nlattr
    - Fixed four small issues pointed out By Jiri and Eric

    v13->v14
    - Rename skb_push_nsh to nsh_push per Dave's comment
    - Rename skb_pop_nsh to nsh_pop per Dave's comment

    v12->v13
    - Fix NSH header length check in set_nsh

    v11->v12
    - Fix missing changes old comments pointed out
    - Fix new comments for v11

    v10->v11
    - Fix the left three disputable comments for v9
    but not fixed in v10.

    v9->v10
    - Change struct ovs_key_nsh to
    struct ovs_nsh_key_base base;
    __be32 context[NSH_MD1_CONTEXT_SIZE];
    - Fix new comments for v9

    v8->v9
    - Fix build error reported by daily intel build
    because nsh module isn't selected by openvswitch

    v7->v8
    - Rework nested value and mask for OVS_KEY_ATTR_NSH
    - Change pop_nsh to adapt to nsh kernel module
    - Fix many issues per comments from Jiri Benc

    v6->v7
    - Remove NSH GSO patches in v6 because Jiri Benc
    reworked it as another patch series and they have
    been merged.
    - Change it to adapt to nsh kernel module added by NSH
    GSO patch series

    v5->v6
    - Fix the rest comments for v4.
    - Add NSH GSO support for VxLAN-gpe + NSH and
    Eth + NSH.

    v4->v5
    - Fix many comments by Jiri Benc and Eric Garver
    for v4.

    v3->v4
    - Add new NSH match field ttl
    - Update NSH header to the latest format
    which will be final format and won't change
    per its author's confirmation.
    - Fix comments for v3.

    v2->v3
    - Change OVS_KEY_ATTR_NSH to nested key to handle
    length-fixed attributes and length-variable
    attriubte more flexibly.
    - Remove struct ovs_action_push_nsh completely
    - Add code to handle nested attribute for SET_MASKED
    - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
    to transfer NSH header data.
    - Fix comments and coding style issues by Jiri and Eric

    v1->v2
    - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
    - Dynamically allocate struct ovs_action_push_nsh for
    length-variable metadata.

    OVS master and 2.8 branch has merged NSH userspace
    patch series, this patch is to enable NSH support
    in kernel data path in order that OVS can support
    NSH in compat mode by porting this.

    Signed-off-by: Yi Yang
    Acked-by: Jiri Benc
    Acked-by: Eric Garver
    Acked-by: Pravin Shelar
    Signed-off-by: David S. Miller

    Yi Yang
     

11 Oct, 2017

1 commit

  • This adds a ct_clear action for clearing conntrack state. ct_clear is
    currently implemented in OVS userspace, but is not backed by an action
    in the kernel datapath. This is useful for flows that may modify a
    packet tuple after a ct lookup has already occurred.

    Signed-off-by: Eric Garver
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Eric Garver
     

17 Aug, 2017

1 commit

  • For sw_flow_actions, the actions_len only represents the kernel part's
    size, and when we dump the actions to the userspace, we will do the
    convertions, so it's true size may become bigger than the actions_len.

    But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
    to alloc the skbuff, so the user_skb's size may become insufficient and
    oops will happen like this:
    skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
    ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:129!
    [...]
    Call Trace:

    [] skb_put+0x43/0x44
    [] skb_zerocopy+0x6c/0x1f4
    [] queue_userspace_packet+0x3a3/0x448 [openvswitch]
    [] ovs_dp_upcall+0x30/0x5c [openvswitch]
    [] output_userspace+0x132/0x158 [openvswitch]
    [] ? ip6_rcv_finish+0x74/0x77 [ipv6]
    [] do_execute_actions+0xcc1/0xdc8 [openvswitch]
    [] ovs_execute_actions+0x74/0x106 [openvswitch]
    [] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
    [] ? key_extract+0x63c/0x8d5 [openvswitch]
    [] ovs_vport_receive+0xa1/0xc3 [openvswitch]
    [...]

    Also we can find that the actions_len is much little than the orig_len:
    crash> struct sw_flow_actions 0xffff8812f539d000
    struct sw_flow_actions {
    rcu = {
    next = 0xffff8812f5398800,
    func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
    }

    So as a quick fix, use the orig_len instead of the actions_len to alloc
    the user_skb.

    Last, this oops happened on our system running a relative old kernel, but
    the same risk still exists on the mainline, since we use the wrong
    actions_len from the beginning.

    Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
    Cc: Neil McKee
    Signed-off-by: Liping Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Liping Zhang
     

23 Mar, 2017

4 commits

  • Added clone_execute() that both the sample and the recirc
    action implementation can use.

    Signed-off-by: Andy Zhou
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    andy zhou
     
  • With the introduction of open flow 'clone' action, the OVS user space
    can now translate the 'clone' action into kernel datapath 'sample'
    action, with 100% probability, to ensure that the clone semantics,
    which is that the packet seen by the clone action is the same as the
    packet seen by the action after clone, is faithfully carried out
    in the datapath.

    While the sample action in the datpath has the matching semantics,
    its implementation is only optimized for its original use.
    Specifically, there are two limitation: First, there is a 3 level of
    nesting restriction, enforced at the flow downloading time. This
    limit turns out to be too restrictive for the 'clone' use case.
    Second, the implementation avoid recursive call only if the sample
    action list has a single userspace action.

    The main optimization implemented in this series removes the static
    nesting limit check, instead, implement the run time recursion limit
    check, and recursion avoidance similar to that of the 'recirc' action.
    This optimization solve both #1 and #2 issues above.

    One related optimization attempts to avoid copying flow key as
    long as the actions enclosed does not change the flow key. The
    detection is performed only once at the flow downloading time.

    Another related optimization is to rewrite the action list
    at flow downloading time in order to save the fast path from parsing
    the sample action list in its original form repeatedly.

    Signed-off-by: Andy Zhou
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    andy zhou
     
  • The logic of allocating and copy key for each 'exec_actions_level'
    was specific to execute_recirc(). However, future patches will reuse
    as well. Refactor the logic into its own function clone_key().

    Signed-off-by: Andy Zhou
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    andy zhou
     
  • add_deferred_actions() API currently requires actions to be passed in
    as a fully encoded netlink message. So far both 'sample' and 'recirc'
    actions happens to carry actions as fully encoded netlink messages.
    However, this requirement is more restrictive than necessary, future
    patch will need to pass in action lists that are not fully encoded
    by themselves.

    Signed-off-by: Andy Zhou
    Acked-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    andy zhou
     

03 Mar, 2017

1 commit


10 Feb, 2017

1 commit

  • Add the fields of the conntrack original direction 5-tuple to struct
    sw_flow_key. The new fields are initially marked as non-existent, and
    are populated whenever a conntrack action is executed and either finds
    or generates a conntrack entry. This means that these fields exist
    for all packets that were not rejected by conntrack as untrackable.

    The original tuple fields in the sw_flow_key are filled from the
    original direction tuple of the conntrack entry relating to the
    current packet, or from the original direction tuple of the master
    conntrack entry, if the current conntrack entry has a master.
    Generally, expected connections of connections having an assigned
    helper (e.g., FTP), have a master conntrack entry.

    The main purpose of the new conntrack original tuple fields is to
    allow matching on them for policy decision purposes, with the premise
    that the admissibility of tracked connections reply packets (as well
    as original direction packets), and both direction packets of any
    related connections may be based on ACL rules applying to the master
    connection's original direction 5-tuple. This also makes it easier to
    make policy decisions when the actual packet headers might have been
    transformed by NAT, as the original direction 5-tuple represents the
    packet headers before any such transformation.

    When using the original direction 5-tuple the admissibility of return
    and/or related packets need not be based on the mere existence of a
    conntrack entry, allowing separation of admission policy from the
    established conntrack state. While existence of a conntrack entry is
    required for admission of the return or related packets, policy
    changes can render connections that were initially admitted to be
    rejected or dropped afterwards. If the admission of the return and
    related packets was based on mere conntrack state (e.g., connection
    being in an established state), a policy change that would make the
    connection rejected or dropped would need to find and delete all
    conntrack entries affected by such a change. When using the original
    direction 5-tuple matching the affected conntrack entries can be
    allowed to time out instead, as the established state of the
    connection would not need to be the basis for packet admission any
    more.

    It should be noted that the directionality of related connections may
    be the same or different than that of the master connection, and
    neither the original direction 5-tuple nor the conntrack state bits
    carry this information. If needed, the directionality of the master
    connection can be stored in master's conntrack mark or labels, which
    are automatically inherited by the expected related connections.

    The fact that neither ARP nor ND packets are trackable by conntrack
    allows mutual exclusion between ARP/ND and the new conntrack original
    tuple fields. Hence, the IP addresses are overlaid in union with ARP
    and ND fields. This allows the sw_flow_key to not grow much due to
    this patch, but it also means that we must be careful to never use the
    new key fields with ARP or ND packets. ARP is easy to distinguish and
    keep mutually exclusive based on the ethernet type, but ND being an
    ICMPv6 protocol requires a bit more attention.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

30 Jan, 2017

1 commit

  • do_execute_actions() implements a worthwhile optimization: in case
    an output action is the last action in an action list, skb_clone()
    can be avoided by outputing the current skb. However, the
    implementation is more complicated than necessary. This patch
    simplify this logic.

    Signed-off-by: Andy Zhou
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    andy zhou
     

13 Nov, 2016

5 commits

  • It's not allowed to push Ethernet header in front of another Ethernet
    header.

    It's not allowed to pop Ethernet header if there's a vlan tag. This
    preserves the invariant that L3 packet never has a vlan tag.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab
    Signed-off-by: Simon Horman
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Update Ethernet header only if there is one.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • We'll need it to alter packets sent to ARPHRD_NONE interfaces.

    Change do_output() to use the actual L2 header size of the packet when
    deciding on the minimum cutlen. The assumption here is that what matters is
    not the output interface hard_header_len but rather the L2 header of the
    particular packet. For example, ARPHRD_NONE tunnels that encapsulate
    Ethernet should get at least the Ethernet header.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Use a hole in the structure. We support only Ethernet so far and will add
    a support for L2-less packets shortly. We could use a bool to indicate
    whether the Ethernet header is present or not but the approach with the
    mac_proto field is more generic and occupies the same number of bytes in the
    struct, while allowing later extensibility. It also makes the code in the
    next patches more self explaining.

    It would be nice to use ARPHRD_ constants but those are u16 which would be
    waste. Thus define our own constants.

    Another upside of this is that we can overload this new field to also denote
    whether the flow key is valid. This has the advantage that on
    refragmentation, we don't have to reparse the packet but can rely on the
    stored eth.type. This is especially important for the next patches in this
    series - instead of adding another branch for L2-less packets before calling
    ovs_fragment, we can just remove all those branches completely.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • On tx, use hard_header_len while deciding whether to refragment or drop the
    packet. That way, all combinations are calculated correctly:

    * L2 packet going to L2 interface (the L2 header len is subtracted),
    * L2 packet going to L3 interface (the L2 header is included in the packet
    lenght),
    * L3 packet going to L3 interface.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

12 Oct, 2016

1 commit

  • If mpls headers were pushed to a defragmented packet, the refragmentation no
    longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The
    network header has to be shifted after the mpls headers for the
    fragmentation and restored afterwards.

    Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

03 Oct, 2016

1 commit


16 Sep, 2016

1 commit

  • The ovs kernel data path currently defers the execution of all
    recirc actions until stack utilization is at a minimum.
    This is too limiting for some packet forwarding scenarios due to
    the small size of the deferred action FIFO (10 entries). For
    example, broadcast traffic sent out more than 10 ports with
    recirculation results in packet drops when the deferred action
    FIFO becomes full, as reported here:

    http://openvswitch.org/pipermail/dev/2016-March/067672.html

    Since the current recursion depth is available (it is already tracked
    by the exec_actions_level pcpu variable), we can use it to determine
    whether to execute recirculation actions immediately (safe when
    recursion depth is low) or defer execution until more stack space is
    available.

    With this change, the deferred action fifo size becomes a non-issue
    for currently failing scenarios because it is no longer used when
    there are three or fewer recursions through ovs_execute_actions().

    Suggested-by: Pravin Shelar
    Signed-off-by: Lance Richardson
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Lance Richardson
     

09 Sep, 2016

1 commit

  • Add support for 802.1ad including the ability to push and pop double
    tagged vlans. Add support for 802.1ad to netlink parsing and flow
    conversion. Uses double nested encap attributes to represent double
    tagged vlan. Inner TPID encoded along with ctci in nested attributes.

    This is based on Thomas F Herbert's original v20 patch. I made some
    small clean ups and bug fixes.

    Signed-off-by: Thomas F Herbert
    Signed-off-by: Eric Garver
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Eric Garver
     

31 Aug, 2016

1 commit

  • As reported by Lennert the MPLS GSO code is failing to properly segment
    large packets. There are a couple of problems:

    1. the inner protocol is not set so the gso segment functions for inner
    protocol layers are not getting run, and

    2 MPLS labels for packets that use the "native" (non-OVS) MPLS code
    are not properly accounted for in mpls_gso_segment.

    The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
    to call the gso segment functions for the higher layer protocols. That
    means skb_mac_gso_segment is called twice -- once with the network
    protocol set to MPLS and again with the network protocol set to the
    inner protocol.

    This patch sets the inner skb protocol addressing item 1 above and sets
    the network_header and inner_network_header to mark where the MPLS labels
    start and end. The MPLS code in OVS is also updated to set the two
    network markers.

    >From there the MPLS GSO code uses the difference between the network
    header and the inner network header to know the size of the MPLS header
    that was pushed. It then pulls the MPLS header, resets the mac_len and
    protocol for the inner protocol and then calls skb_mac_gso_segment
    to segment the skb.

    Afterward the inner protocol segmentation is done the skb protocol
    is set to mpls for each segment and the network and mac headers
    restored.

    Reported-by: Lennert Buytenhek
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

11 Jun, 2016

1 commit

  • The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
    truncate packets. A 'max_len' is added for setting up the maximum
    packet size, and a 'cutlen' field is to record the number of bytes
    to trim the packet when the packet is outputting to a port, or when
    the packet is sent to userspace.

    Signed-off-by: William Tu
    Cc: Pravin Shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    William Tu
     

01 Jun, 2016

1 commit

  • In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
    {push,pop}_mpls() as they the type in the ethernet header.

    As suggested by Pravin Shelar.

    Cc: Pravin Shelar
    Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel")
    Signed-off-by: Simon Horman
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Simon Horman
     

22 Apr, 2016

1 commit

  • When using masked actions the ipv6_proto field of an action
    to set IPv6 fields may be zero rather than the prevailing protocol
    which will result in skipping checksum recalculation.

    This patch resolves the problem by relying on the protocol
    in the flow key rather than that in the set field action.

    Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.")
    Cc: Jarno Rajahalme
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

20 Feb, 2016

1 commit


19 Jan, 2016

1 commit

  • It was seen that defective configurations of openvswitch could overwrite
    the STACK_END_MAGIC and cause a hard crash of the kernel because of too
    many recursions within ovs.

    This problem arises due to the high stack usage of openvswitch. The rest
    of the kernel is fine with the current limit of 10 (RECURSION_LIMIT).

    We use the already existing recursion counter in ovs_execute_actions to
    implement an upper bound of 5 recursions.

    Cc: Pravin Shelar
    Cc: Simon Horman
    Cc: Eric Dumazet
    Cc: Simon Horman
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

01 Nov, 2015

1 commit


28 Oct, 2015

1 commit

  • If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
    freed. When handle_fragments() passes this back up to
    do_execute_actions(), it will be freed again. Prevent this double free
    by never freeing the skb in do_execute_actions() for errors returned by
    ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Reported-by: Florian Westphal
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Oct, 2015

1 commit

  • While transitioning to netdev based vport we broke OVS
    feature which allows user to retrieve tunnel packet egress
    information for lwtunnel devices. Following patch fixes it
    by introducing ndo operation to get the tunnel egress info.
    Same ndo operation can be used for lwtunnel devices and compat
    ovs-tnl-vport devices. So after adding such device operation
    we can remove similar operation from ovs-vport.

    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

20 Oct, 2015

1 commit


07 Oct, 2015

2 commits


05 Oct, 2015

1 commit

  • Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
    for these to be consistent with conntrack.

    Fixes: c2ac667 "openvswitch: Allow matching on conntrack label"
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

30 Sep, 2015

4 commits