17 Sep, 2016

1 commit

  • Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to
    operate in 'collect metadata' mode.
    bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away.
    ovs can use it as well in the future (once appropriate ovs-vport
    abstractions and user apis are added).
    Note that just like in other tunnels we cannot cache the dst,
    since tunnel_info metadata can be different for every packet.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Thomas Graf
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

11 Sep, 2016

1 commit

  • Add utility functions to convert a 32 bits key into a 64 bits tunnel and
    vice versa.
    These functions will be used instead of cloning code in GRE and VXLAN,
    and in tc act_iptunnel which will be introduced in a following patch in
    this patchset.

    Signed-off-by: Amir Vadai
    Signed-off-by: Hadar Hen Zion
    Reviewed-by: Shmulik Ladkani
    Acked-by: Jiri Benc
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Amir Vadai
     

19 Jun, 2016

1 commit


16 Jun, 2016

1 commit

  • In the presence of firewalls which improperly block ICMP Unreachable
    (including Fragmentation Required) messages, Path MTU Discovery is
    prevented from working.

    A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as
    is done for other payloads like AppleTalk--and doing transparent
    fragmentation and reassembly.

    Redux includes the enforcement of mutual exclusion between this feature
    and Path MTU Discovery as suggested by Alexander Duyck.

    Cc: Alexander Duyck
    Reviewed-by: Stephen Hemminger
    Signed-off-by: Philip Prindeville

    Signed-off-by: David S. Miller

    Philip Prindeville
     

21 May, 2016

1 commit

  • Consolidate all the ip_tunnel_encap definitions in one spot in the
    header file. Also, move ip_encap_hlen and ip_tunnel_encap from
    ip_tunnel.c to ip_tunnels.h so they call be called without a dependency
    on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

05 May, 2016

1 commit

  • For ipgre interfaces in collect metadata mode, receive also traffic with
    encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
    out correctly. This allows to have mixed Ethernet + L3-only traffic on the
    same lwtunnel interface. This is the same way as VXLAN-GPE behaves.

    To keep backwards compatibility and prevent any surprises, gretap interfaces
    have priority in receiving packets with Ethernet headers.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

17 Apr, 2016

1 commit

  • This patch updates the IP tunnel core function iptunnel_handle_offloads so
    that we return an int and do not free the skb inside the function. This
    actually allows us to clean up several paths in several tunnels so that we
    can free the skb at one point in the path without having to have a
    secondary path if we are supporting tunnel offloads.

    In addition it should resolve some double-free issues I have found in the
    tunnels paths as I believe it is possible for us to end up triggering such
    an event in the case of fou or gue.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

14 Apr, 2016

1 commit


07 Apr, 2016

1 commit


21 Mar, 2016

1 commit

  • If a packet is either locally encapsulated or processed through GRO
    it is marked with the offloads that it requires. However, when it is
    decapsulated these tunnel offload indications are not removed. This
    means that if we receive an encapsulated TCP packet, aggregate it with
    GRO, decapsulate, and retransmit the resulting frame on a NIC that does
    not support encapsulation, we won't be able to take advantage of hardware
    offloads even though it is just a simple TCP packet at this point.

    This fixes the problem by stripping off encapsulation offload indications
    when packets are decapsulated.

    The performance impacts of this bug are significant. In a test where a
    Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated,
    and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a
    result of avoiding unnecessary segmentation at the VM tap interface.

    Reported-by: Ramu Ramamurthy
    Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE")
    Signed-off-by: Jesse Gross
    Signed-off-by: David S. Miller

    Jesse Gross
     

19 Mar, 2016

1 commit


12 Mar, 2016

1 commit

  • This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
    from call sites. Currently, there's no such option and it's always set to
    zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
    that flow-based tunnels via collect metadata frontends can make use of it.
    vxlan and geneve will be converted to add flow label support separately.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 Mar, 2016

2 commits

  • Helpers like ip_tunnel_info_opts_{get,set}() are only available if
    CONFIG_INET is set, thus add an empty definition into the header for
    the !CONFIG_INET case, where already other empty inline helpers are
    defined.

    This avoids ifdef kludge inside filter.c, but also vxlan and geneve
    themself where this facility can only be used with, depend on INET
    being set. For the !INET case TUNNEL_OPTIONS_PRESENT would never be
    set in flags.

    Fixes: 14ca0751c96f ("bpf: support for access to tunnel options")
    Reported-by: Fengguang Wu
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • The assumptions from commit 0c1d70af924b ("net: use dst_cache for vxlan
    device"), 468dfffcd762 ("geneve: add dst caching support") and 3c1cb4d2604c
    ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
    when ip_tunnel_info is used is unfortunately not always valid as assumed.

    While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
    however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
    with different remote dsts, tos, etc, therefore they cannot be assumed as
    packet independent.

    Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
    would reuse the same entry that was first created on the initial route
    lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
    a different one.

    Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
    in vxlan needs to be handeled differently in this context as it is currently
    inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
    helper is added for the three tunnel cases, which checks if we can use dst
    cache.

    Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
    Fixes: 468dfffcd762 ("geneve: add dst caching support")
    Fixes: 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels")
    Signed-off-by: Daniel Borkmann
    Acked-by: Paolo Abeni
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

23 Feb, 2016

1 commit


19 Feb, 2016

1 commit


17 Feb, 2016

2 commits

  • In case of UDP traffic with datagram length
    below MTU this give about 2% performance increase
    when tunneling over ipv4 and about 60% when tunneling
    over ipv6

    Signed-off-by: Paolo Abeni
    Suggested-and-acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • The current ip_tunnel cache implementation is prone to a race
    that will cause the wrong dst to be cached on cuncurrent dst cache
    miss and ip tunnel update via netlink.

    Replacing with the generic implementation fix the issue.

    Signed-off-by: Paolo Abeni
    Suggested-and-acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     

12 Feb, 2016

1 commit


10 Feb, 2016

1 commit

  • Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
    transmit vxlan packets of any size, constrained only by the ability to
    send out the resulting packets. 4.3 introduced netdevs corresponding
    to tunnel vports. These netdevs have an MTU, which limits the size of
    a packet that can be successfully encapsulated. The default MTU
    values are low (1500 or less), which is awkwardly small in the context
    of physical networks supporting jumbo frames, and leads to a
    conspicuous change in behaviour for userspace.

    Instead, set the MTU on openvswitch-created netdevs to be the relevant
    maximum (i.e. the maximum IP packet size minus any relevant overhead),
    effectively restoring the behaviour prior to 4.3.

    Signed-off-by: David Wragg
    Signed-off-by: David S. Miller

    David Wragg
     

26 Dec, 2015

1 commit


17 Nov, 2015

1 commit

  • Drivers like vxlan use the recently introduced
    udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
    makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
    packet, updates the struct stats using the usual
    u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
    udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
    tstats, so drivers like vxlan, immediately after, call
    iptunnel_xmit_stats, which does the same thing - calls
    u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).

    While vxlan is probably fine (I don't know?), calling a similar function
    from, say, an unbound workqueue, on a fully preemptable kernel causes
    real issues:

    [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
    [ 188.435579] caller is debug_smp_processor_id+0x17/0x20
    [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
    [ 188.435607] Call Trace:
    [ 188.435611] [] dump_stack+0x4f/0x7b
    [ 188.435615] [] check_preemption_disabled+0x19d/0x1c0
    [ 188.435619] [] debug_smp_processor_id+0x17/0x20

    The solution would be to protect the whole
    this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
    disabling preemption and then reenabling it.

    Signed-off-by: Jason A. Donenfeld
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Jason A. Donenfeld
     

25 Sep, 2015

1 commit

  • When using ip lwtunnels, the additional data for xmit (basically, the actual
    tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
    metadata dst. When replying to ARP requests, we need to send the reply to
    the same tunnel the request came from. This means we need to construct
    proper metadata dst for ARP replies.

    We could perform another route lookup to get a dst entry with the correct
    lwtstate. However, this won't always ensure that the outgoing tunnel is the
    same as the incoming one, and it won't work anyway for IPv4 duplicate
    address detection.

    The only thing to do is to "reverse" the ip_tunnel_info.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     

01 Sep, 2015

1 commit

  • Currently tun-info options pointer is used in few cases to
    pass options around. But tunnel options can be accessed using
    ip_tunnel_info_opts() API without using the pointer. Following
    patch removes the redundant pointer and consistently make use
    of API.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Reviewed-by: Jesse Gross
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

30 Aug, 2015

2 commits

  • There's currently nothing preventing directing packets with IPv6
    encapsulation data to IPv4 tunnels (and vice versa). If this happens,
    IPv6 addresses are incorrectly interpreted as IPv4 ones.

    Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
    in ip_tunnel_info. Reject packets at appropriate places if they are supposed
    to be encapsulated into an incompatible protocol.

    Signed-off-by: Jiri Benc
    Acked-by: Alexei Starovoitov
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • The mode field holds a single bit of information only (whether the
    ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags.
    This allows more mode flags to be added.

    Signed-off-by: Jiri Benc
    Acked-by: Alexei Starovoitov
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

21 Aug, 2015

5 commits


11 Aug, 2015

1 commit


23 Jul, 2015

2 commits


22 Jul, 2015

4 commits

  • This add the ability to select a routing table based on the tunnel
    id which allows to maintain separate routing tables for each virtual
    tunnel network.

    ip rule add from all tunnel-id 100 lookup 100
    ip rule add from all tunnel-id 200 lookup 200

    A new static key controls the collection of metadata at tunnel level
    upon demand.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This introduces a new IP tunnel lightweight tunnel type which allows
    to specify IP tunnel instructions per route. Only IPv4 is supported
    at this point.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Allows putting a VXLAN device into a new flow-based mode in which
    skbs with a ip_tunnel_info dst metadata attached will be encapsulated
    according to the instructions stored in there with the VXLAN device
    defaults taken into consideration.

    Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
    set, the packet processing will populate a ip_tunnel_info struct for
    each packet received and attach it to the skb using the new metadata
    dst. The metadata structure will contain the outer header and tunnel
    header fields which have been stripped off. Layers further up in the
    stack such as routing, tc or netfitler can later match on these fields
    and perform forwarding. It is the responsibility of upper layers to
    ensure that the flag is set if the metadata is needed. The flag limits
    the additional cost of metadata collecting based on demand.

    This prepares the VXLAN device to be steered by the routing and other
    subsystems which allows to support encapsulation for a large number
    of tunnel endpoints and tunnel ids through a single net_device which
    improves the scalability.

    It also allows for OVS to leverage this mode which in turn allows for
    the removal of the OVS specific VXLAN code.

    Because the skb is currently scrubed in vxlan_rcv(), the attachment of
    the new dst metadata is postponed until after scrubing which requires
    the temporary addition of a new member to vxlan_metadata. This member
    is removed again in a later commit after the indirect VXLAN receive API
    has been removed.

    Signed-off-by: Thomas Graf
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Rename the tunnel metadata data structures currently internal to
    OVS and make them generic for use by all IP tunnels.

    Both structures are kernel internal and will stay that way. Their
    members are exposed to user space through individual Netlink
    attributes by OVS. It will therefore be possible to extend/modify
    these structures without affecting user ABI.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

03 Apr, 2015

1 commit


20 Jan, 2015

1 commit