Eric Lee / smarc-fsl-linux-kernel

09 Sep, 2017

2 commits

32a805baf ipv6: fix typo in fib6_net_exit() ... Browse Code »

IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.

Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2017-09-09 07:09:04 +0800
ba1cc08d9 ipv6: fix memory leak with multiple tables during netns destruction ... Browse Code »

fib6_net_exit only frees the main and local tables. If another table was
created with fib6_alloc_table, we leak it when the netns is destroyed.

Fix this in the same way ip_fib_net_exit cleans up tables, by walking
through the whole hashtable of fib6_table's. We can get rid of the
special cases for local and main, since they're also part of the
hashtable.

Reproducer:
ip netns add x
ip -net x -6 rule add from 6003:1::/64 table 100
ip netns del x

Reported-by: Jianlin Shi
Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2017-09-09 00:35:42 +0800

02 Sep, 2017

1 commit

6026e043d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Three cases of simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2017-09-02 08:42:05 +0800

29 Aug, 2017

2 commits

1e2ea8ad3 ipv6: set dst.obsolete when a cached route has expired ... Browse Code »

Now it doesn't check for the cached route expiration in ipv6's
dst_ops->check(), because it trusts dst_gc that would clean the
cached route up when it's expired.

The problem is in dst_gc, it would clean the cached route only
when it's refcount is 1. If some other module (like xfrm) keeps
holding it and the module only release it when dst_ops->check()
fails.

But without checking for the cached route expiration, .check()
may always return true. Meanwhile, without releasing the cached
route, dst_gc couldn't del it. It will cause this cached route
never to expire.

This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
in .check.

Note that this is even needed when ipv6 dst_gc timer is removed
one day. It would set dst.obsolete in .redirect and .update_pmtu
instead, and check for cached route expiration when getting it,
just like what ipv4 route does.

Reported-by: Jianlin Shi
Signed-off-by: Xin Long
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Xin Long
2017-08-29 06:45:04 +0800
4e587ea71 ipv6: fix sparse warning on rt6i_node ... Browse Code »

Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
net/ipv6/route.c:1394:30: error: incompatible types in comparison
expression (different address spaces)
./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
expression (different address spaces)

This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang
Acked-by: Eric Dumazet
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-08-29 06:34:40 +0800

23 Aug, 2017

1 commit

c5cff8561 ipv6: add rcu grace period before freeing fib6_node ... Browse Code »

We currently keep rt->rt6i_node pointing to the fib6_node for the route.
And some functions make use of this pointer to dereference the fib6_node
from rt structure, e.g. rt6_check(). However, as there is neither
refcount nor rcu taken when dereferencing rt->rt6i_node, it could
potentially cause crashes as rt->rt6i_node could be set to NULL by other
CPUs when doing a route deletion.
This patch introduces an rcu grace period before freeing fib6_node and
makes sure the functions that dereference it takes rcu_read_lock().

Note: there is no "Fixes" tag because this bug was there in a very
early stage.

Signed-off-by: Wei Wang
Acked-by: Eric Dumazet
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-08-23 02:03:19 +0800

22 Aug, 2017

1 commit

e2a7c34fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2017-08-22 08:06:42 +0800

21 Aug, 2017

1 commit

348a40027 ipv6: repair fib6 tree in failure case ... Browse Code »

In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.

Here is the syzkaller reported general protection fault in fib6_locate:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
RIP: 0010:[] [] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
RIP: 0010:[] [] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
RIP: 0010:[] [] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
RIP: 0010:[] [] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202
RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
Call Trace:
[] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
[] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
[] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
[] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
[] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
[] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
[] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
[] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
[] sock_sendmsg_nosec net/socket.c:609 [inline]
[] sock_sendmsg+0xcf/0x110 net/socket.c:619
[] sock_write_iter+0x222/0x3a0 net/socket.c:834
[] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
[] __vfs_write+0xe4/0x110 fs/read_write.c:491
[] vfs_write+0x178/0x4b0 fs/read_write.c:538
[] SYSC_write fs/read_write.c:585 [inline]
[] SyS_write+0xd9/0x1b0 fs/read_write.c:577
[] entry_SYSCALL_64_fastpath+0x12/0x17

Note: there is no "Fixes" tag as this seems to be a bug introduced
very early.

Signed-off-by: Wei Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Wei Wang
2017-08-21 11:06:56 +0800

19 Aug, 2017

1 commit

383143f31 ipv6: reset fn->rr_ptr when replacing route ... Browse Code »

syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
[] __dump_stack lib/dump_stack.c:15 [inline]
[] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[] print_address_description mm/kasan/report.c:196 [inline]
[] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[] kasan_report mm/kasan/report.c:305 [inline]
[] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[] rt6_select net/ipv6/route.c:755 [inline]
[] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[] SyS_connect+0x29/0x30 net/socket.c:1563
[] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384

The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Wei Wang
2017-08-19 07:02:22 +0800

10 Aug, 2017

1 commit

b97bac64a rtnetlink: make rtnl_register accept a flags parameter ... Browse Code »

This change allows us to later indicate to rtnetlink core that certain
doit functions should be called without acquiring rtnl_mutex.

This change should have no effect, we simply replace the last (now
unused) calcit argument with the new flag.

Signed-off-by: Florian Westphal
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florian Westphal
2017-08-10 07:57:38 +0800

04 Aug, 2017

6 commits

a460aa839 ipv6: fib: Add helpers to hold / drop a reference on rt6_info ... Browse Code »

Similar to commit 1c677b3d2828 ("ipv4: fib: Add fib_info_hold() helper")
and commit b423cb10807b ("ipv4: fib: Export free_fib_info()") add an
helper to hold a reference on rt6_info and export rt6_release() to drop
it and potentially release the route.

This is needed so that drivers capable of FIB offload could hold a
reference on the route before queueing it for offload and drop it after
the route has been programmed to the device's tables.

Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-08-04 06:36:00 +0800
7483cea79 ipv6: fib: Unlink replaced routes from their nodes ... Browse Code »

When a route is deleted its node pointer is set to NULL to indicate it's
no longer linked to its node. Do the same for routes that are replaced.

This will later allow us to test if a route is still in the FIB by
checking its node pointer instead of its reference count.

Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-08-04 06:36:00 +0800
c5b12410f ipv6: fib: Don't assume only nodes hold a reference on routes ... Browse Code »

The code currently assumes that only FIB nodes can hold a reference on
routes. Therefore, after fib6_purge_rt() has run and the route is no
longer present in any intermediate nodes, it's assumed that its
reference count would be 1 - taken by the node where it's currently
stored.

However, we're going to allow users other than the FIB to take a
reference on a route, so this assumption is no longer valid and the
BUG_ON() needs to be removed.

Note that purging only takes place if the initial reference count is
different than 1. I've left that check intact, as in the majority of
systems (where routes are only referenced by the FIB), it does actually
mean the route is present in intermediate nodes.

Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-08-04 06:36:00 +0800
e1ee0a5ba ipv6: fib: Dump tables during registration to FIB chain ... Browse Code »

Dump all the FIB tables in each net namespace upon registration to the
FIB notification chain so that the callee will have a complete view of
the tables.

The integrity of the dump is ensured by a per-table sequence counter
that is incremented (under write lock) whenever a route is added or
deleted from the table.

All the sequence counters are read (under each table's read lock) and
summed, prior and after the dump. In case the counters differ, then the
dump is either restarted or the registration fails.

While it's possible for a table to be modified after its counter has
been read, this isn't really a problem. In case it happened before it
was read the second time, then the comparison at the end will fail. If
it happened afterwards, then we're guaranteed to be notified about the
change, as the notification block is registered prior to the second
read.

Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-08-04 06:36:00 +0800
df77fe4d9 ipv6: fib: Add in-kernel notifications for route add / delete ... Browse Code »

As with IPv4, allow listeners of the FIB notification chain to receive
notifications whenever a route is added, replaced or deleted. This is
done by placing calls to the FIB notification chain in the two lowest
level functions that end up performing these operations - namely,
fib6_add_rt2node() and fib6_del_route().

Unlike IPv4, APPEND notifications aren't sent as the kernel doesn't
distinguish between "append" (NLM_F_CREATE|NLM_F_APPEND) and "prepend"
(NLM_F_CREATE). If NLM_F_EXCL isn't set, duplicate routes are always
added after the existing duplicate routes.

Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-08-04 06:36:00 +0800
16ab6d7d4 ipv6: fib: Add FIB notifiers callbacks ... Browse Code »

We're about to add IPv6 FIB offload support, so implement the necessary
callbacks in IPv6 code, which will later allow us to add routes and
rules notifications.

Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-08-04 06:35:59 +0800

06 Jul, 2017

1 commit

f06b7549b net: ipv6: Compare lwstate in detecting duplicate nexthops ... Browse Code »

Lennert reported a failure to add different mpls encaps in a multipath
route:

$ ip -6 route add 1234::/16 \
nexthop encap mpls 10 via fe80::1 dev ens3 \
nexthop encap mpls 20 via fe80::1 dev ens3
RTNETLINK answers: File exists

The problem is that the duplicate nexthop detection does not compare
lwtunnel configuration. Add it.

Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: David Ahern
Reported-by: João Taveira Araújo
Reported-by: Lennert Buytenhek
Acked-by: Roopa Prabhu
Tested-by: Lennert Buytenhek
Signed-off-by: David S. Miller

David Ahern
2017-07-06 17:48:01 +0800

22 Jun, 2017

1 commit

3d0919824 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller

David S. Miller
2017-06-22 05:35:22 +0800

21 Jun, 2017

1 commit

07f615574 ipv6: Do not leak throw route references ... Browse Code »

While commit 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
does good job on error propagation to the fib_rules_lookup()
in fib rules core framework that also corrects throw routes
handling, it does not solve route reference leakage problem
happened when we return -EAGAIN to the fib_rules_lookup()
and leave routing table entry referenced in arg->result.

If rule with matched throw route isn't last matched in the
list we overwrite arg->result losing reference on throw
route stored previously forever.

We also partially revert commit ab997ad40839 ("ipv6: fix the
incorrect return value of throw route") since we never return
routing table entry with dst.error == -EAGAIN when
CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point
to check for RTF_REJECT flag since it is always set throw
route.

Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
Signed-off-by: Serhey Popovych
Signed-off-by: David S. Miller

Serhey Popovych
2017-06-21 03:34:02 +0800

18 Jun, 2017

5 commits

a4c2fd7f7 net: remove DST_NOCACHE flag ... Browse Code »

DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:01 +0800
db916649b ipv6: get rid of icmp6 dst garbage collector ... Browse Code »

icmp6 dst route is currently ref counted during creation and will be
freed by user during its call of dst_release(). So no need of a garbage
collector for it.
Remove all icmp6 dst garbage collector related code.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:00 +0800
587fea741 ipv6: mark DST_NOGC and remove the operation of dst_free() ... Browse Code »

With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv6 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls
to dst_free() and its related functions.
At this point, all dst created in ipv6 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Also, as icmp6 dst route is refcounted during creation and will be freed
by user during its call of dst_release(), there is no need to add this
dst to the icmp6 gc list as well.
Instead, we need to add it into uncached list so that when a
NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through
these icmp6 dst as well and release the net device properly.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:00 +0800
9514528d9 ipv6: call dst_dev_put() properly ... Browse Code »

As the intend of this patch series is to completely remove dst gc,
we need to call dst_dev_put() to release the reference to dst->dev
when removing routes from fib because we won't keep the gc list anymore
and will lose the dst pointer right after removing the routes.
Without the gc list, there is no way to find all the dst's that have
dst->dev pointing to the going-down dev.
Hence, we are doing dst_dev_put() immediately before we lose the last
reference of the dst from the routing code. The next dst_check() will
trigger a route re-lookup to find another route (if there is any).

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:00 +0800
1cfb71eeb ipv6: take dst->__refcnt for insertion into fib6 tree ... Browse Code »

In IPv6 routing code, struct rt6_info is created for each static route
and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
ref count is not taken.
As explained in the previous patch, this leads to the need of the dst
garbage collector.

This patch holds ref count of dst before inserting the route into fib6
tree and properly releases the dst when deleting it from the fib6 tree
as a preparation in order to fully get rid of dst gc later.

Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
no user is referencing the dst.

And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
dst->__refcnt to 1.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:00 +0800

23 May, 2017

2 commits

d5d531cb5 net: ipv6: Add extack messages for route add failures ... Browse Code »

Add messages for non-obvious errors (e.g, no need to add text for malloc
failures or ENODEV failures). This mostly covers the annoying EINVAL errors
Some message strings violate the 80-columns but searchable strings need to
trump that rule.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-05-23 00:12:20 +0800
333c43016 net: ipv6: Plumb extack through route add functions ... Browse Code »

Plumb extack argument down to route add functions.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-05-23 00:12:20 +0800

14 Mar, 2017

1 commit

67e194007 ipv6: make ECMP route replacement less greedy ... Browse Code »

Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a
loop that removes all siblings of an ECMP route that is being
replaced. However, this loop doesn't stop when it has replaced
siblings, and keeps removing other routes with a higher metric.
We also end up triggering the WARN_ON after the loop, because after
this nsiblings < 0.

Instead, stop the loop when we have taken care of all routes with the
same metric as the route being replaced.

Reproducer:
===========
#!/bin/sh

ip netns add ns1
ip netns add ns2
ip -net ns1 link set lo up

for x in 0 1 2 ; do
ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
ip -net ns1 link set eth$x up
ip -net ns2 link set veth$x up
done

ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048

echo "before replace, 3 routes"
ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
echo

ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2

echo "after replace, only 2 routes, metric 2048 is gone"
ip -net ns1 -6 r | grep -v '^fe80\|^ff00'

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Sabrina Dubroca
Acked-by: Nicolas Dichtel
Reviewed-by: Xin Long
Reviewed-by: Michal Kubecek
Signed-off-by: David S. Miller

Sabrina Dubroca
2017-03-14 03:16:17 +0800

05 Feb, 2017

3 commits

16a16cd35 net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH ... Browse Code »

If an entire multipath route is deleted using prefix and len (without any
nexthops), send a single RTM_DELROUTE notification with the full route
using RTA_MULTIPATH. This is done by generating the skb before the route
delete when all of the sibling routes are still present but sending it
after the route has been removed from the FIB. The skip_notify flag
is used to tell the lower fib code not to send notifications for the
individual nexthop routes.

If a route is deleted using RTA_MULTIPATH for any nexthops or a single
nexthop entry is deleted, then the nexthops are deleted one at a time with
notifications sent as each hop is deleted. This is necessary given that
IPv6 allows individual hops within a route to be deleted.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-02-05 08:58:14 +0800
3b1137fe7 net: ipv6: Change notifications for multipath add to RTA_MULTIPATH ... Browse Code »

Change ip6_route_multipath_add to send one notifciation with the full
route encoded with RTA_MULTIPATH instead of a series of individual routes.
This is done by adding a skip_notify flag to the nl_info struct. The
flag is used to skip sending of the notification in the fib code that
actually inserts the route. Once the full route has been added, a
notification is generated with all nexthops.

ip6_route_multipath_add handles 3 use cases: new routes, route replace,
and route append. The multipath notification generated needs to be
consistent with the order of the nexthops and it should be consistent
with the order in a FIB dump which means the route with the first nexthop
needs to be used as the route reference. For the first 2 cases (new and
replace), a reference to the route used to send the notification is
obtained by saving the first route added. For the append case, the last
route added is used to loop back to its first sibling route which is
the first nexthop in the multipath route.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-02-05 08:58:14 +0800
beb1afac5 net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute ... Browse Code »

IPv6 returns multipath routes as a series of individual routes making
their display and handling by userspace different and more complicated
than IPv4, putting the burden on the user to see that a route is part of
a multipath route and internally creating a multipath route if desired
(e.g., libnl does this as of commit 29b71371e764). This patch addresses
this difference, allowing multipath routes to be returned using the
RTA_MULTIPATH attribute.

The end result is that IPv6 multipath routes can be treated and displayed
in a format similar to IPv4:

$ ip -6 ro ls vrf red
2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium
2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium
2001:db8:200::/120 metric 1024
nexthop via 2001:db8:1::2 dev eth1 weight 1
nexthop via 2001:db8:2::2 dev eth2 weight 1

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-02-05 08:58:14 +0800

02 Feb, 2017

1 commit

1f5e29ce7 net: ipv6: add NLM_F_APPEND in notifications when applicable ... Browse Code »

IPv6 does not set the NLM_F_APPEND flag in notifications to signal that
a NEWROUTE is an append versus a new route or a replaced one. Add the
flag if the request has it.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-02-02 01:13:52 +0800

10 Sep, 2016

1 commit

73483c128 ipv6: report NLM_F_CREATE and NLM_F_EXCL flags in RTM_NEWROUTE events ... Browse Code »

Since commit 37a1d3611c12 ("ipv6: include NLM_F_REPLACE in route
replace notifications"), RTM_NEWROUTE notifications have their
NLM_F_REPLACE flag set if the new route replaced a preexisting one.
However, other flags aren't set.

This patch reports the missing NLM_F_CREATE and NLM_F_EXCL flag bits.

NLM_F_APPEND is not reported, because in ipv6 a NLM_F_CREATE request
is interpreted as an append request (contrary to ipv4, "prepend" is not
supported, so if NLM_F_EXCL is not set then NLM_F_APPEND is implicit).

As a result, the possible flag combination can now be reported
(iproute2's terminology into parentheses):

* NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
("add").
* NLM_F_CREATE: route did already exist, new route added after
preexisting ones ("append").
* NLM_F_REPLACE: route did already exist, new route replaced the
first preexisting one ("change").

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2016-09-10 07:50:23 +0800

06 Jul, 2016

1 commit

903ce4abd ipv6: Fix mem leak in rt6i_pcpu ... Browse Code »

It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581

free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

However, after fixing a deadlock bug in
commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

kmemleak somehow did not report it. We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).

Signed-off-by: Martin KaFai Lau
Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy
Tested-by: Petr Novopashenniy
Acked-by: Hannes Frederic Sowa
Cc: Hannes Frederic Sowa
Cc: Petr Novopashenniy
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-07-06 05:09:23 +0800

07 May, 2016

1 commit

b3b4663c9 net: vrf: Create FIB tables on link create ... Browse Code »

Tables have to exist for VRFs to function. Ensure they exist
when VRF device is created.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-05-07 03:51:47 +0800

09 Mar, 2016

3 commits

3dc94f93b ipv6: per netns FIB garbage collection ... Browse Code »

One of our customers observed issues with FIB6 garbage collectors
running in different network namespaces blocking each other, resulting
in soft lockups (fib6_run_gc() initiated from timer runs always in
forced mode).

Now that FIB6 walkers are separated per namespace, there is no more need
for instances of fib6_run_gc() in different namespaces blocking each
other. There is still a call to icmp6_dst_gc() which operates on shared
data but this function is protected by its own shared lock.

Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Michal Kubeček
2016-03-09 04:16:51 +0800
9a03cd8f3 ipv6: per netns fib6 walkers ... Browse Code »

The IPv6 FIB data structures are separated per network namespace but
there is still only one global walkers list and one global walker list
lock. This means changes in one namespace unnecessarily interfere with
walkers in other namespaces.

Replace the global list with per-netns lists (and give each its own
lock).

Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Michal Kubeček
2016-03-09 04:16:51 +0800
3570df914 ipv6: replace global gc_args with local variable ... Browse Code »

Global variable gc_args is only used in fib6_run_gc() and functions
called from it. As fib6_run_gc() makes sure there is at most one
instance of fib6_clean_all() running at any moment, we can replace
gc_args with a local variable which will be needed once multiple
instances (per netns) of garbage collector are allowed.

Signed-off-by: Michal Kubecek
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Michal Kubeček
2016-03-09 04:16:50 +0800

24 Oct, 2015

1 commit

ba3e2084f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller

David S. Miller
2015-10-24 21:54:12 +0800

23 Oct, 2015

1 commit

ab997ad40 ipv6: fix the incorrect return value of throw route ... Browse Code »

The error condition -EAGAIN, which is signaled by throw routes, tells
the rules framework to walk on searching for next matches. If the walk
ends and we stop walking the rules with the result of a throw route we
have to translate the error conditions to -ENETUNREACH.

Signed-off-by: Xin Long
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

lucien
2015-10-23 17:38:18 +0800

13 Oct, 2015

1 commit

c48506877 net: Export fib6_get_table and nd_tbl ... Browse Code »

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-10-13 19:55:05 +0800