09 Mar, 2016
2 commits
-
One of our customers observed issues with FIB6 garbage collectors
running in different network namespaces blocking each other, resulting
in soft lockups (fib6_run_gc() initiated from timer runs always in
forced mode).Now that FIB6 walkers are separated per namespace, there is no more need
for instances of fib6_run_gc() in different namespaces blocking each
other. There is still a call to icmp6_dst_gc() which operates on shared
data but this function is protected by its own shared lock.Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller -
The IPv6 FIB data structures are separated per network namespace but
there is still only one global walkers list and one global walker list
lock. This means changes in one namespace unnecessarily interfere with
walkers in other namespaces.Replace the global list with per-netns lists (and give each its own
lock).Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller
10 Jul, 2015
1 commit
-
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.This add the ip_nonlocal_bind sysctl under ipv6.
Testing:
Set up nonlocal binding and receive routing on a host, e.g.:
ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1Set up routing to 2001:0:0:1::/64 on peer to go to first host
ping6 -I 2001:0:0:1::1 peer-address -- to verify
Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
04 May, 2015
1 commit
-
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.Background:
IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
24 Mar, 2015
1 commit
-
This is specified by RFC 7217.
Cc: Erik Kline
Cc: Fernando Gont
Cc: Lorenzo Colitti
Cc: YOSHIFUJI Hideaki/吉藤英明
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
28 Feb, 2015
1 commit
-
Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic. The kernel code is
structured similar to how the vxlan driver does a group join / leave.example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5Signed-off-by: Madhu Challa
Signed-off-by: David S. Miller
07 Oct, 2014
1 commit
-
Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.Also remove rt_genid which I forgot to remove in 705f1c869d577c ("ipv6:
remove rt6i_genid").Cc: YOSHIFUJI Hideaki
Cc: Martin Lau
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
08 Jul, 2014
1 commit
-
Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.Performance impact:
Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.Automatic flow labels disabled:
TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tpsUDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tpsAutomatic flow labels enabled:
TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
14 May, 2014
1 commit
-
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.Tested using user-mode linux:
- ICMP/ICMPv6 echo replies and errors.
- TCP RST packets (IPv4 and IPv6).Signed-off-by: Lorenzo Colitti
Signed-off-by: David S. Miller
20 Jan, 2014
1 commit
-
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.Changelog of V3:
* rename ip6_flowlabel_consistency to flowlabel_consistency
* use net_info_ratelimited()
* checkpatch cleanupsSigned-off-by: Florent Fourcot
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
15 Jan, 2014
1 commit
-
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.
Suggested-by: Hannes Frederic Sowa
Signed-off-by: Francois-Xavier Le Bail
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
08 Jan, 2014
1 commit
-
This change allows to follow a recommandation of RFC4942.
- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
as source addresses for ICMPv6 echo reply. This sysctl is false by default
to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
(http://tools.ietf.org/html/rfc4942#section-2.1.6)2.1.6. Anycast Traffic Identification and Security
[...]
To avoid exposing knowledge about the internal structure of the
network, it is recommended that anycast servers now take advantage of
the ability to return responses with the anycast address as the
source address if possible.Signed-off-by: Francois-Xavier Le Bail
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
01 Aug, 2013
1 commit
-
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
entries even when the policy is only applied for one address family.Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.Signed-off-by: Fan Du
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
25 Mar, 2013
1 commit
-
This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).Reported-by: Junwei Zhang
Reported-by: Hongjun Li
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
06 Feb, 2013
1 commit
-
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.Signed-off-by: Michal Kubecek
Signed-off-by: Steffen Klassert
20 Sep, 2012
1 commit
-
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.Cc: Herbert Xu
Cc: Michal Kubeček
Cc: David Miller
Cc: Patrick McHardy
Cc: Pablo Neira Ayuso
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
30 Aug, 2012
1 commit
-
Signed-off-by: Patrick McHardy
09 Jun, 2012
1 commit
-
now inetpeer doesn't support namespace,the information will
be leaking across namespace.this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.and change family_to_base and inet_getpeer to support namespace.
Signed-off-by: Gao feng
Signed-off-by: David S. Miller
21 Apr, 2012
1 commit
-
The sysctl core no longer natively understands sysctl tables
with .child entries.Split the ipv6_table to remove the .child entries.
Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller
11 May, 2010
4 commits
-
This patch adds support for multiple independant multicast routing instances,
named "tables".Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.Example usage:
- bind pimd/xorp/... to a specific table:
uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));- create routing rules directing packets to the new table:
# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123Signed-off-by: Patrick McHardy
-
Signed-off-by: Patrick McHardy
-
Signed-off-by: Patrick McHardy
-
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.Signed-off-by: Patrick McHardy
18 Jan, 2010
1 commit
-
'security' tables depend on SECURITY, so ifdef them.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy
02 Sep, 2009
1 commit
-
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.For that:
* move struct dst_ops into separate header to fix circular dependencies
I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netnsFor a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
11 Dec, 2008
6 commits
-
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.
At the moment, this variable is only referenced in init_net.
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv6.At the moment, these variables are only referenced in init_net.
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare variable cache_resolve_queue_len per-namespace: moves it into
struct netns_ipv6.This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ip6mr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc6_cache.Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Preliminary work to make IPv6 multicast forwarding netns-aware.
Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
and moves it to struct netns_ipv6.At the moment, mfc6_cache_array is only referenced in init_net.
Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Preliminary work to make IPv6 multicast forwarding netns-aware.
Dynamically allocates interface table vif6_table and moves it to
struct netns_ipv6, and updates MIF_EXISTS() macro.At the moment, vif6_table is only referenced in init_net.
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Preliminary work to make IPv6 multicast forwarding netns-aware.
Make IPv6 multicast forwarding mroute6_socket per-namespace,
moves it into struct netns_ipv6.At the moment, mroute6_socket is only referenced in init_net.
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller
23 Jul, 2008
1 commit
-
FIB timer list is a trivial size structure, avoid indirection and just
put it in existing ns.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
10 Jun, 2008
1 commit
-
This is a port of the IPv4 security table for IPv6.
Signed-off-by: James Morris
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
08 Mar, 2008
3 commits
-
This patch make use of the network namespace information at the right
places to handle the multicast for several network namespaces. It
makes the socket control to be per namespace too.Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Instead of having a tcp6_socket global to all the namespace, there is
tcp6 socket control per namespace. That is consistent with which
namespace sent a RST and allows to pass the socket to the underlying
function to retrieve the network namespace.Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller -
Make ndisc socket control per namespace.
Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller
05 Mar, 2008
3 commits
-
This patch makes the necessary changes to make IPv6 dst_entry garbage
collection work with multiple network namespaces.In ip6_dst_gc(), static local variables are now declared
per-namespace.Signed-off-by: Benjamin Thery
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller -
The ip6_dst_ops is moved inside the network namespace structure. All
references to this structure are now relative to the initial network
namespace.Signed-off-by: Benjamin Thery
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller -
The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller
04 Mar, 2008
1 commit
-
The rt6_stats is now per namespace with this patch. It is allocated
when a network namespace is created and freed when the network
namespace exits and references are relative to the network namespace.Signed-off-by: Benjamin Thery
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller