Eric Lee / smarc-fsl-linux-kernel

09 Mar, 2016

2 commits

3dc94f93b ipv6: per netns FIB garbage collection ... Browse Code »

One of our customers observed issues with FIB6 garbage collectors
running in different network namespaces blocking each other, resulting
in soft lockups (fib6_run_gc() initiated from timer runs always in
forced mode).

Now that FIB6 walkers are separated per namespace, there is no more need
for instances of fib6_run_gc() in different namespaces blocking each
other. There is still a call to icmp6_dst_gc() which operates on shared
data but this function is protected by its own shared lock.

Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Michal Kubeček
2016-03-09 04:16:51 +0800
9a03cd8f3 ipv6: per netns fib6 walkers ... Browse Code »

The IPv6 FIB data structures are separated per network namespace but
there is still only one global walkers list and one global walker list
lock. This means changes in one namespace unnecessarily interfere with
walkers in other namespaces.

Replace the global list with per-netns lists (and give each its own
lock).

Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Michal Kubeček
2016-03-09 04:16:51 +0800

10 Jul, 2015

1 commit

35a256fee ipv6: Nonlocal bind ... Browse Code »

Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-07-10 12:09:10 +0800

04 May, 2015

1 commit

82a584b7c ipv6: Flow label state ranges ... Browse Code »

This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-05-04 09:58:01 +0800

24 Mar, 2015

1 commit

1855b7c3e ipv6: introduce idgen_delay and idgen_retries knobs ... Browse Code »

This is specified by RFC 7217.

Cc: Erik Kline
Cc: Fernando Gont
Cc: Lorenzo Colitti
Cc: YOSHIFUJI Hideaki/吉藤英明
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-03-24 10:12:09 +0800

28 Feb, 2015

1 commit

93a714d6b multicast: Extend ip address command to enable multicast group join/leave on ... Browse Code »

Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.

Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.

By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic. The kernel code is
structured similar to how the vxlan driver does a group join / leave.

example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5

Signed-off-by: Madhu Challa
Signed-off-by: David S. Miller

Madhu Challa
2015-02-28 05:25:25 +0800

07 Oct, 2014

1 commit

812918c46 ipv6: make fib6 serial number per namespace ... Browse Code »

Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.

Also remove rt_genid which I forgot to remove in 705f1c869d577c ("ipv6:
remove rt6i_genid").

Cc: YOSHIFUJI Hideaki
Cc: Martin Lau
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-10-07 12:02:30 +0800

08 Jul, 2014

1 commit

cb1ce2ef3 ipv6: Implement automatic flow label generation on transmit ... Browse Code »

Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps

UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps

Automatic flow labels enabled:

TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06

UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2014-07-08 12:14:21 +0800

14 May, 2014

1 commit

e110861f8 net: add a sysctl to reflect the fwmark on replies ... Browse Code »

Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
- ICMP/ICMPv6 echo replies and errors.
- TCP RST packets (IPv4 and IPv6).

Signed-off-by: Lorenzo Colitti
Signed-off-by: David S. Miller

Lorenzo Colitti
2014-05-14 06:35:08 +0800

20 Jan, 2014

1 commit

6444f72b4 ipv6: add flowlabel_consistency sysctl ... Browse Code »

With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
* rename ip6_flowlabel_consistency to flowlabel_consistency
* use net_info_ratelimited()
* checkpatch cleanups

Signed-off-by: Florent Fourcot
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2014-01-20 09:12:31 +0800

15 Jan, 2014

1 commit

ec35b61ea IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6 ... Browse Code »

This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.

Suggested-by: Hannes Frederic Sowa
Signed-off-by: Francois-Xavier Le Bail
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

FX Le Bail
2014-01-15 10:18:22 +0800

08 Jan, 2014

1 commit

509aba3b0 IPv6: add the option to use anycast addresses as source addresses in echo reply ... Browse Code »

This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
as source addresses for ICMPv6 echo reply. This sysctl is false by default
to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
(http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

[...]
To avoid exposing knowledge about the internal structure of the
network, it is recommended that anycast servers now take advantage of
the ability to return responses with the anycast address as the
source address if possible.

Signed-off-by: Francois-Xavier Le Bail
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

FX Le Bail
2014-01-08 04:51:39 +0800

01 Aug, 2013

1 commit

ca4c3fc24 net: split rt_genid for ipv4 and ipv6 ... Browse Code »

Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

fan.du
2013-08-01 05:56:36 +0800

25 Mar, 2013

1 commit

63998ac24 ipv6: provide addr and netconf dump consistency info ... Browse Code »

This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.

Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).

Reported-by: Junwei Zhang
Reported-by: Hongjun Li
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-03-25 05:16:29 +0800

06 Feb, 2013

1 commit

8d068875c xfrm: make gc_thresh configurable in all namespaces ... Browse Code »

The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.

Signed-off-by: Michal Kubecek
Signed-off-by: Steffen Klassert

Michal Kubecek
2013-02-06 18:36:29 +0800

20 Sep, 2012

1 commit

c038a767c ipv6: add a new namespace for nf_conntrack_reasm ... Browse Code »

As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu
Cc: Michal Kubeček
Cc: David Miller
Cc: Patrick McHardy
Cc: Pablo Neira Ayuso
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Amerigo Wang
2012-09-20 05:23:28 +0800

30 Aug, 2012

1 commit

58a317f10 netfilter: ipv6: add IPv6 NAT support ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:17 +0800

09 Jun, 2012

1 commit

c8a627ed0 inetpeer: add namespace support for inetpeer ... Browse Code »

now inetpeer doesn't support namespace,the information will
be leaking across namespace.

this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.

add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.

and change family_to_base and inet_getpeer to support namespace.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2012-06-09 05:27:23 +0800

21 Apr, 2012

1 commit

6dceb0368 net ipv6: Don't use sysctl tables with .child entries. ... Browse Code »

The sysctl core no longer natively understands sysctl tables
with .child entries.

Split the ipv6_table to remove the .child entries.

Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric W. Biederman
2012-04-21 09:22:29 +0800

11 May, 2010

4 commits

d1db275dd ipv6: ip6mr: support multiple tables ... Browse Code »

This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:55 +0800
6bd521433 ipv6: ip6mr: move mroute data into seperate structure ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:53 +0800
f30a77842 ipv6: ip6mr: convert struct mfc_cache to struct list_head ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:51 +0800
c476efbcd ipv6: ip6mr: move unres_queue and timer to per-namespace data ... Browse Code »

The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.

As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:48 +0800

18 Jan, 2010

1 commit

e9d3897cc netfilter: netns: #ifdef ->iptable_security, ->ip6table_security ... Browse Code »

'security' tables depend on SECURITY, so ifdef them.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2010-01-18 15:08:37 +0800

02 Sep, 2009

1 commit

86393e52c netns: embed ip6_dst_ops directly ... Browse Code »

struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2009-09-02 08:40:31 +0800

11 Dec, 2008

6 commits

950d5704e netns: ip6mr: declare reg_vif_num per-namespace ... Browse Code »

Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.

At the moment, this variable is only referenced in init_net.

Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Benjamin Thery
2008-12-11 08:29:24 +0800
a21f3f997 netns: ip6mr: declare mroute_do_assert and mroute_do_pim per-namespace ... Browse Code »

Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv6.

At the moment, these variables are only referenced in init_net.

Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Benjamin Thery
2008-12-11 08:28:44 +0800
4045e57c1 netns: ip6mr: declare counter cache_resolve_queue_len per-namespace ... Browse Code »

Preliminary work to make IPv6 multicast forwarding netns-aware.

Declare variable cache_resolve_queue_len per-namespace: moves it into
struct netns_ipv6.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ip6mr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc6_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.

Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Benjamin Thery
2008-12-11 08:27:21 +0800
4a6258a0e netns: ip6mr: dynamically allocate mfc6_cache_array ... Browse Code »

Preliminary work to make IPv6 multicast forwarding netns-aware.

Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
and moves it to struct netns_ipv6.

At the moment, mfc6_cache_array is only referenced in init_net.

Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.

Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Benjamin Thery
2008-12-11 08:24:07 +0800
4e16880cb netns: ip6mr: dynamically allocates vif6_table ... Browse Code »

Preliminary work to make IPv6 multicast forwarding netns-aware.

Dynamically allocates interface table vif6_table and moves it to
struct netns_ipv6, and updates MIF_EXISTS() macro.

At the moment, vif6_table is only referenced in init_net.

Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Benjamin Thery
2008-12-11 08:15:08 +0800
bd91b8bf3 netns: ip6mr: allocate mroute6_socket per-namespace. ... Browse Code »

Preliminary work to make IPv6 multicast forwarding netns-aware.

Make IPv6 multicast forwarding mroute6_socket per-namespace,
moves it into struct netns_ipv6.

At the moment, mroute6_socket is only referenced in init_net.

Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Benjamin Thery
2008-12-11 08:07:08 +0800

23 Jul, 2008

1 commit

417f28bb3 netns: dont alloc ipv6 fib timer list ... Browse Code »

FIB timer list is a trivial size structure, avoid indirection and just
put it in existing ns.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2008-07-23 05:33:45 +0800

10 Jun, 2008

1 commit

17e6e59f0 netfilter: ip6_tables: add ip6tables security table ... Browse Code »

This is a port of the IPv4 security table for IPv6.

Signed-off-by: James Morris
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

James Morris
2008-06-10 06:58:05 +0800

08 Mar, 2008

3 commits

b8ad0cbc5 [NETNS][IPV6] mcast - handle several network namespace ... Browse Code »

This patch make use of the network namespace information at the right
places to handle the multicast for several network namespaces. It
makes the socket control to be per namespace too.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-08 03:16:55 +0800
93ec926b0 [NETNS][IPV6] tcp6 - make socket control per namespace ... Browse Code »

Instead of having a tcp6_socket global to all the namespace, there is
tcp6 socket control per namespace. That is consistent with which
namespace sent a RST and allows to pass the socket to the underlying
function to retrieve the network namespace.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-08 03:16:02 +0800
1762f7e88 [NETNS][IPV6] ndisc - make socket control per namespace ... Browse Code »

Make ndisc socket control per namespace.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-08 03:15:34 +0800

05 Mar, 2008

3 commits

6891a346c [NETNS][IPV6] route6 - make garbage collection work with multiple network namespaces ... Browse Code »

This patch makes the necessary changes to make IPv6 dst_entry garbage
collection work with multiple network namespaces.

In ip6_dst_gc(), static local variables are now declared
per-namespace.

Signed-off-by: Benjamin Thery
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller

Benjamin Thery
2008-03-05 05:49:47 +0800
f2fc6a545 [NETNS][IPV6] route6 - move ip6_dst_ops inside the network namespace ... Browse Code »

The ip6_dst_ops is moved inside the network namespace structure. All
references to this structure are now relative to the initial network
namespace.

Signed-off-by: Benjamin Thery
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller

Benjamin Thery
2008-03-05 05:49:23 +0800
8ed677896 [NETNS][IPV6] rt6_info - move rt6_info structure inside the namespace ... Browse Code »

The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-05 05:48:30 +0800

04 Mar, 2008

1 commit

c572872f8 [NETNS][IPV6] rt6_stats - make the stats per network namespace ... Browse Code »

The rt6_stats is now per namespace with this patch. It is allocated
when a network namespace is created and freed when the network
namespace exits and references are relative to the network namespace.

Signed-off-by: Benjamin Thery
Signed-off-by: Daniel Lezcano
Signed-off-by: David S. Miller

Benjamin Thery
2008-03-04 15:34:17 +0800