20 Mar, 2016
1 commit
-
Pull networking updates from David Miller:
"Highlights:1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
19 Mar, 2016
3 commits
-
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
for this, which makes the code a bit more generic and allows to remove
BPF_TUNLEN_MAX from eBPF code.Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller -
Currently output of MPLS packets on tunnel vports is not allowed by Open
vSwitch. This is because historically encapsulation was done in such a way
that the inner_protocol field of the skb needed to hold the inner protocol
for both MPLS and tunnel encapsulation in order for GSO segmentation to be
performed correctly.Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of
vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
perform encapsulation. As no drivers expose support for MPLS offloads this
means that GSO packets are segmented in software by validate_xmit_skb(),
which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
This means that the inner protocol of MPLS is no longer needed by the time
encapsulation occurs and the contention on the inner_protocol field of the
skb no longer occurs.Thus it is now safe to output MPLS to tunnel vports.
Signed-off-by: Simon Horman
Reviewed-by: Jesse Gross
Signed-off-by: David S. Miller -
Signed-off-by: Fengguang Wu
Signed-off-by: David S. Miller
18 Mar, 2016
1 commit
-
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
drivers/rtc: broken link fix
drm/i915 Fix typos in i915_gem_fence.c
Docs: fix missing word in REPORTING-BUGS
lib+mm: fix few spelling mistakes
MAINTAINERS: add git URL for APM driver
treewide: Fix typo in printk
15 Mar, 2016
8 commits
-
Pablo Neira Ayuso says:
====================
Netfilter/IPVS/OVS updates for net-nextThe following patchset contains Netfilter/IPVS fixes and OVS NAT
support, more specifically this batch is composed of:1) Fix a crash in ipset when performing a parallel flush/dump with
set:list type, from Jozsef Kadlecsik.2) Make sure NFACCT_FILTER_* netlink attributes are in place before
accessing them, from Phil Turnbull.3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP
helper, from Arnd Bergmann.4) Add workaround to IPVS to reschedule existing connections to new
destination server by dropping the packet and wait for retransmission
of TCP syn packet, from Julian Anastasov.5) Allow connection rescheduling in IPVS when in CLOSE state, also
from Julian.6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni.
7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef.
8) Check match/targetinfo netlink attribute size in nft_compat,
patch from Florian Westphal.9) Check for integer overflow on 32-bit systems in x_tables, from
Florian Westphal.Several patches from Jarno Rajahalme to prepare the introduction of
NAT support to OVS based on the Netfilter infrastructure:10) Schedule IP_CT_NEW_REPLY definition for removal in
nf_conntrack_common.h.11) Simplify checksumming recalculation in nf_nat.
12) Add comments to the openvswitch conntrack code, from Jarno.
13) Update the CT state key only after successful nf_conntrack_in()
invocation.14) Find existing conntrack entry after upcall.
15) Handle NF_REPEAT case due to templates in nf_conntrack_in().
16) Call the conntrack helper functions once the conntrack has been
confirmed.17) And finally, add the NAT interface to OVS.
The batch closes with:
18) Cleanup to use spin_unlock_wait() instead of
spin_lock()/spin_unlock(), from Nicholas Mc Guire.
====================Signed-off-by: David S. Miller
-
Extend OVS conntrack interface to cover NAT. New nested
OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
attributes, new (non-committed/non-confirmed) connections are mangled
according to the rest of the nested attributes.The corresponding OVS userspace patch series includes test cases (in
tests/system-traffic.at) that also serve as example uses.This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.Signed-off-by: Jarno Rajahalme
Acked-by: Thomas Graf
Acked-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso -
There is no need to help connections that are not confirmed, so we can
delay helping new connections to the time when they are confirmed.
This change is needed for NAT support, and having this as a separate
patch will make the following NAT patch a bit easier to review.Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso -
Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This
avoids dropping a SYN packet re-opening an existing TCP connection.Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso -
Add a new function ovs_ct_find_existing() to find an existing
conntrack entry for which this packet was already applied to. This is
only to be called when there is evidence that the packet was already
tracked and committed, but we lost the ct reference due to an
userspace upcall.ovs_ct_find_existing() is called from skb_nfct_cached(), which can now
hide the fact that the ct reference may have been lost due to an
upcall. This allows ovs_ct_commit() to be simplified.This patch is needed by later "openvswitch: Interface with NAT" patch,
as we need to be able to pass the packet through NAT using the
original ct reference also after the reference is lost after an
upcall.Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso -
Only a successful nf_conntrack_in() call can effect a connection state
change, so it suffices to update the key only after the
nf_conntrack_in() returns.This change is needed for the later NAT patches.
Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso -
This makes the code easier to understand and the following patches
more focused.Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso -
Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
not make sense. This allows the definition of IP_CT_NUMBER to be
simplified as well.Signed-off-by: Jarno Rajahalme
Signed-off-by: Pablo Neira Ayuso
14 Mar, 2016
1 commit
-
When we want to change a flow using netlink, we have to identify it to
be able to perform a lookup. Both the flow key and unique flow ID
(ufid) are valid identifiers, but we always have to specify the flow
key in the netlink message. When both attributes are there, the ufid
is used. The flow key is used to validate the actions provided by
the userland.This commit allows to use the ufid without having to provide the flow
key, as it is already done in the netlink 'flow get' and 'flow del'
path. The flow key remains mandatory when an action is provided.Signed-off-by: Samuel Gauthier
Reviewed-by: Simon Horman
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
02 Mar, 2016
1 commit
-
This patch implements bookkeeping support to compute the maximum
headroom for all the devices in each datapath. When said value
changes, the underlying devs are notified via the
ndo_set_rx_headroom method.This also increases the internal vports xmit performance.
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
23 Feb, 2016
1 commit
-
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.cAll three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller
20 Feb, 2016
2 commits
-
Replace individual implementations with the recently introduced
skb_postpush_rcsum() helper.Signed-off-by: Daniel Borkmann
Acked-by: Tom Herbert
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be
correctly controlled.") changed the default xmit checksum setting
for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
set into external UDP header.
This commit changes the rx checksum setting for both lwt vxlan/geneve
devices created by openvswitch accordingly, so that lwt over ipv6
tunnel pairs are again able to communicate with default values.Signed-off-by: Paolo Abeni
Acked-by: Jiri Benc
Acked-by: Jesse Gross
Signed-off-by: David S. Miller
19 Feb, 2016
2 commits
-
This reverts commit bb9b18fb55b0 ("genl: Add genlmsg_new_unicast() for
unicast message allocation")'.Nothing wrong with it; its no longer needed since this was only for
mmapped netlink support.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller -
revert commit 795449d8b846 ("openvswitch: Enable memory mapped Netlink i/o").
Following the mmaped netlink removal this code can be removed.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
17 Feb, 2016
1 commit
-
In case of UDP traffic with datagram length
below MTU this give about 2% performance increase
when tunneling over ipv4 and about 60% when tunneling
over ipv6Signed-off-by: Paolo Abeni
Suggested-and-acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
15 Feb, 2016
1 commit
-
This patch fix spelling typos found in printk and Kconfig.
Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina
11 Feb, 2016
1 commit
-
Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
massive oneReported-by: James Page
Signed-off-by: Tycho Andersen
CC: Eric Biederman
CC: Pravin Shelar
CC: Justin Pettit
CC: "David S. Miller"
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
10 Feb, 2016
1 commit
-
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.Signed-off-by: David Wragg
Signed-off-by: David S. Miller
19 Jan, 2016
1 commit
-
It was seen that defective configurations of openvswitch could overwrite
the STACK_END_MAGIC and cause a hard crash of the kernel because of too
many recursions within ovs.This problem arises due to the high stack usage of openvswitch. The rest
of the kernel is fine with the current limit of 10 (RECURSION_LIMIT).We use the already existing recursion counter in ovs_execute_actions to
implement an upper bound of 5 recursions.Cc: Pravin Shelar
Cc: Simon Horman
Cc: Eric Dumazet
Cc: Simon Horman
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
16 Jan, 2016
1 commit
-
Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.Signed-off-by: Konstantin Khlebnikov
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller
11 Jan, 2016
3 commits
-
commit be4ace6e6b1b ("openvswitch: Move dev pointer into vport itself")
The commit above added @dev and moved @rcu to the bottom of struct
vport, but the change was not reflected in the kernel doc. So let's
update the kernel doc as well.Signed-off-by: Jean Sacren
Cc: Thomas Graf
Acked-by: Thomas Graf
Signed-off-by: David S. Miller -
commit 6b001e682e90 ("openvswitch: Use Geneve device.")
The commit above introduced 'port_no' as the name for the member of
struct geneve_port. The correct name should be 'dst_port' as described
in the kernel doc. Let's fix that member name and all the pertinent
instances so that both doc and code would be consistent.Signed-off-by: Jean Sacren
Acked-by: Thomas Graf
Signed-off-by: David S. Miller -
commit 6b001e682e90 ("openvswitch: Use Geneve device.")
The commit above deleted the only call site of ovs_tunnel_route_lookup()
and now that function is not used any more. So let's delete the function
definition as well.Signed-off-by: Jean Sacren
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
01 Jan, 2016
1 commit
30 Dec, 2015
1 commit
-
Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
reference leak on helper objects, but inadvertently introduced a leak on
the ct template.Previously, ct_info.ct->general.use was initialized to 0 by
nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
returned successful. If an error occurred while adding the helper or
adding the action to the actions buffer, the __ovs_ct_free_action()
cleanup would use nf_ct_put() to free the entry; However, this relies on
atomic_dec_and_test(ct_info.ct->general.use). This reference must be
incremented first, or nf_ct_put() will never free it.Fix the issue by acquiring a reference to the template immediately after
allocation.Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
Signed-off-by: Joe Stringer
Signed-off-by: David S. Miller
19 Dec, 2015
2 commits
-
In a set action tunnel attributes should be encoded in a
nested action.I noticed this because ovs-dpctl was reporting an error
when dumping flows due to the incorrect encoding of tunnel attributes
in a set action.Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.")
Signed-off-by: Simon Horman
Signed-off-by: David S. Miller -
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:1) Two patches to include dependencies in our netfilter userspace
headers to resolve compilation problems, from Mikko Rapeli.2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.
3) Remove duplicate include in the netfilter reject infrastructure,
from Stephen Hemminger.4) Two patches to simplify the netfilter defragmentation code for IPv6,
patch from Florian Westphal.5) Fix root ownership of /proc/net netfilter for unpriviledged net
namespaces, from Philip Whineray.6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.
7) Add mangling support to our nf_tables payload expression, from
Patrick McHardy.8) Introduce a new netlink-based tracing infrastructure for nf_tables,
from Florian Westphal.9) Change setter functions in nfnetlink_log to be void, from
Rami Rosen.10) Add netns support to the cttimeout infrastructure.
11) Add cgroup2 support to iptables, from Tejun Heo.
12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.
13) Add support for mangling pkttype in the nf_tables meta expression,
also from Florian.BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================Signed-off-by: David S. Miller
18 Dec, 2015
1 commit
-
Conflicts:
drivers/net/geneve.cHere we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.Signed-off-by: David S. Miller
15 Dec, 2015
1 commit
-
Resolve conflict between commit 264640fc2c5f4f ("ipv6: distinguish frag
queues by device for multicast and link-local packets") from the net
tree and commit 029f7f3b8701c ("netfilter: ipv6: nf_defrag: avoid/free
clone operations") from the nf-next tree.Signed-off-by: Pablo Neira Ayuso
Conflicts:
net/ipv6/netfilter/nf_conntrack_reasm.c
12 Dec, 2015
2 commits
-
If userspace executes ct(zone=1), and the connection tracker determines
that the packet is invalid, then the ct_zone flow key field is populated
with the default zone rather than the zone that was specified. Even
though connection tracking failed, this field should be updated with the
value that the action specified. Fix the issue.Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
If the actions (re)allocation fails, or the actions list is larger than the
maximum size, and the conntrack action is the last action when these
problems are hit, then references to helper modules may be leaked. Fix
the issue.Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
04 Dec, 2015
3 commits
-
Conflicts:
drivers/net/ethernet/renesas/ravb_main.c
kernel/bpf/syscall.c
net/ipv4/ipmr.cAll three conflicts were cases of overlapping changes.
Signed-off-by: David S. Miller
-
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni
Acked-by: Flavio Leitner
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller