17 Sep, 2016
1 commit
-
Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to
operate in 'collect metadata' mode.
bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away.
ovs can use it as well in the future (once appropriate ovs-vport
abstractions and user apis are added).
Note that just like in other tunnels we cannot cache the dst,
since tunnel_info metadata can be different for every packet.Signed-off-by: Alexei Starovoitov
Acked-by: Thomas Graf
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
11 Sep, 2016
1 commit
-
Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.Signed-off-by: Amir Vadai
Signed-off-by: Hadar Hen Zion
Reviewed-by: Shmulik Ladkani
Acked-by: Jiri Benc
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
19 Jun, 2016
1 commit
-
ipgre_err() can call ip6_err_gen_icmpv6_unreach() for proper
support of ipv4+gre+icmp+ipv6+... frames, used for example
by traceroute/mtr.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Jun, 2016
1 commit
-
In the presence of firewalls which improperly block ICMP Unreachable
(including Fragmentation Required) messages, Path MTU Discovery is
prevented from working.A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as
is done for other payloads like AppleTalk--and doing transparent
fragmentation and reassembly.Redux includes the enforcement of mutual exclusion between this feature
and Path MTU Discovery as suggested by Alexander Duyck.Cc: Alexander Duyck
Reviewed-by: Stephen Hemminger
Signed-off-by: Philip PrindevilleSigned-off-by: David S. Miller
21 May, 2016
1 commit
-
Consolidate all the ip_tunnel_encap definitions in one spot in the
header file. Also, move ip_encap_hlen and ip_tunnel_encap from
ip_tunnel.c to ip_tunnels.h so they call be called without a dependency
on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
05 May, 2016
1 commit
-
For ipgre interfaces in collect metadata mode, receive also traffic with
encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
out correctly. This allows to have mixed Ethernet + L3-only traffic on the
same lwtunnel interface. This is the same way as VXLAN-GPE behaves.To keep backwards compatibility and prevent any surprises, gretap interfaces
have priority in receiving packets with Ethernet headers.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller
17 Apr, 2016
1 commit
-
This patch updates the IP tunnel core function iptunnel_handle_offloads so
that we return an int and do not free the skb inside the function. This
actually allows us to clean up several paths in several tunnels so that we
can free the skb at one point in the path without having to have a
secondary path if we are supporting tunnel offloads.In addition it should resolve some double-free issues I have found in the
tunnels paths as I believe it is possible for us to end up triggering such
an event in the case of fou or gue.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller
14 Apr, 2016
1 commit
-
The structure can be packed denser by doing minor rearrangement
of existing elements.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
07 Apr, 2016
1 commit
-
Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner
protocol.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller
21 Mar, 2016
1 commit
-
If a packet is either locally encapsulated or processed through GRO
it is marked with the offloads that it requires. However, when it is
decapsulated these tunnel offload indications are not removed. This
means that if we receive an encapsulated TCP packet, aggregate it with
GRO, decapsulate, and retransmit the resulting frame on a NIC that does
not support encapsulation, we won't be able to take advantage of hardware
offloads even though it is just a simple TCP packet at this point.This fixes the problem by stripping off encapsulation offload indications
when packets are decapsulated.The performance impacts of this bug are significant. In a test where a
Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated,
and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a
result of avoiding unnecessary segmentation at the VM tap interface.Reported-by: Ramu Ramamurthy
Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE")
Signed-off-by: Jesse Gross
Signed-off-by: David S. Miller
19 Mar, 2016
1 commit
-
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
for this, which makes the code a bit more generic and allows to remove
BPF_TUNLEN_MAX from eBPF code.Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller
12 Mar, 2016
1 commit
-
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller
09 Mar, 2016
2 commits
-
Helpers like ip_tunnel_info_opts_{get,set}() are only available if
CONFIG_INET is set, thus add an empty definition into the header for
the !CONFIG_INET case, where already other empty inline helpers are
defined.This avoids ifdef kludge inside filter.c, but also vxlan and geneve
themself where this facility can only be used with, depend on INET
being set. For the !INET case TUNNEL_OPTIONS_PRESENT would never be
set in flags.Fixes: 14ca0751c96f ("bpf: support for access to tunnel options")
Reported-by: Fengguang Wu
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
The assumptions from commit 0c1d70af924b ("net: use dst_cache for vxlan
device"), 468dfffcd762 ("geneve: add dst caching support") and 3c1cb4d2604c
("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
when ip_tunnel_info is used is unfortunately not always valid as assumed.While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
with different remote dsts, tos, etc, therefore they cannot be assumed as
packet independent.Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
would reuse the same entry that was first created on the initial route
lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
a different one.Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
in vxlan needs to be handeled differently in this context as it is currently
inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
helper is added for the three tunnel cases, which checks if we can use dst
cache.Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd762 ("geneve: add dst caching support")
Fixes: 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels")
Signed-off-by: Daniel Borkmann
Acked-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
23 Feb, 2016
1 commit
-
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.cAll three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller
19 Feb, 2016
1 commit
-
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller
17 Feb, 2016
2 commits
-
In case of UDP traffic with datagram length
below MTU this give about 2% performance increase
when tunneling over ipv4 and about 60% when tunneling
over ipv6Signed-off-by: Paolo Abeni
Suggested-and-acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
The current ip_tunnel cache implementation is prone to a race
that will cause the wrong dst to be cached on cuncurrent dst cache
miss and ip tunnel update via netlink.Replacing with the generic implementation fix the issue.
Signed-off-by: Paolo Abeni
Suggested-and-acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
12 Feb, 2016
1 commit
-
All users now pass false, so we can remove it, and remove the code that
was conditional upon it.Signed-off-by: Edward Cree
Signed-off-by: David S. Miller
10 Feb, 2016
1 commit
-
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.Signed-off-by: David Wragg
Signed-off-by: David S. Miller
26 Dec, 2015
1 commit
-
By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller
17 Nov, 2015
1 commit
-
Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:[ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
[ 188.435579] caller is debug_smp_processor_id+0x17/0x20
[ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
[ 188.435607] Call Trace:
[ 188.435611] [] dump_stack+0x4f/0x7b
[ 188.435615] [] check_preemption_disabled+0x19d/0x1c0
[ 188.435619] [] debug_smp_processor_id+0x17/0x20The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.Signed-off-by: Jason A. Donenfeld
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
25 Sep, 2015
1 commit
-
When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.The only thing to do is to "reverse" the ip_tunnel_info.
Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
01 Sep, 2015
1 commit
-
Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.Signed-off-by: Pravin B Shelar
Acked-by: Thomas Graf
Reviewed-by: Jesse Gross
Signed-off-by: David S. Miller
30 Aug, 2015
2 commits
-
There's currently nothing preventing directing packets with IPv6
encapsulation data to IPv4 tunnels (and vice versa). If this happens,
IPv6 addresses are incorrectly interpreted as IPv4 ones.Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
in ip_tunnel_info. Reject packets at appropriate places if they are supposed
to be encapsulated into an incompatible protocol.Signed-off-by: Jiri Benc
Acked-by: Alexei Starovoitov
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
The mode field holds a single bit of information only (whether the
ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags.
This allows more mode flags to be added.Signed-off-by: Jiri Benc
Acked-by: Alexei Starovoitov
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
21 Aug, 2015
5 commits
-
Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll
be used with IPv6 tunnels, too.Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Signed-off-by: David S. Miller -
Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
newly introduced padding after the IPv4 addresses needs to be zeroed out.Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Signed-off-by: David S. Miller -
The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types.
Unify it to the non-underscore variants.Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Signed-off-by: David S. Miller -
The custom alignment of struct ip_tunnel_key is unnecessary. In struct
sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the
first field.The structure is also packed even without the __packed keyword.
Signed-off-by: Jiri Benc
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
11 Aug, 2015
1 commit
-
Following patch create new tunnel flag which enable
tunnel metadata collection on given device.Signed-off-by: Pravin B Shelar
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
23 Jul, 2015
2 commits
-
Convert the module_init() to a invocation from inet_init() since
ip_tunnel_core is part of the INET built-in.Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
Account for the configuration FIB_RULES=y && INET=n as FIB_RULES can
be selected by IPV6 or DECNET without INET.Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id")
Fixes: 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel")
Reported-by: kbuild test robot
Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
22 Jul, 2015
4 commits
-
This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200A new static key controls the collection of metadata at tunnel level
upon demand.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
This introduces a new IP tunnel lightweight tunnel type which allows
to specify IP tunnel instructions per route. Only IPv4 is supported
at this point.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller -
Allows putting a VXLAN device into a new flow-based mode in which
skbs with a ip_tunnel_info dst metadata attached will be encapsulated
according to the instructions stored in there with the VXLAN device
defaults taken into consideration.Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
set, the packet processing will populate a ip_tunnel_info struct for
each packet received and attach it to the skb using the new metadata
dst. The metadata structure will contain the outer header and tunnel
header fields which have been stripped off. Layers further up in the
stack such as routing, tc or netfitler can later match on these fields
and perform forwarding. It is the responsibility of upper layers to
ensure that the flag is set if the metadata is needed. The flag limits
the additional cost of metadata collecting based on demand.This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.It also allows for OVS to leverage this mode which in turn allows for
the removal of the OVS specific VXLAN code.Because the skb is currently scrubed in vxlan_rcv(), the attachment of
the new dst metadata is postponed until after scrubing which requires
the temporary addition of a new member to vxlan_metadata. This member
is removed again in a later commit after the indirect VXLAN receive API
has been removed.Signed-off-by: Thomas Graf
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
03 Apr, 2015
1 commit
-
Don't use dev->iflink anymore.
CC: Steffen Klassert
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
20 Jan, 2015
1 commit
-
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller