09 Sep, 2017

2 commits

  • IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.

    Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • fib6_net_exit only frees the main and local tables. If another table was
    created with fib6_alloc_table, we leak it when the netns is destroyed.

    Fix this in the same way ip_fib_net_exit cleans up tables, by walking
    through the whole hashtable of fib6_table's. We can get rid of the
    special cases for local and main, since they're also part of the
    hashtable.

    Reproducer:
    ip netns add x
    ip -net x -6 rule add from 6003:1::/64 table 100
    ip netns del x

    Reported-by: Jianlin Shi
    Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

02 Sep, 2017

1 commit


29 Aug, 2017

2 commits

  • Now it doesn't check for the cached route expiration in ipv6's
    dst_ops->check(), because it trusts dst_gc that would clean the
    cached route up when it's expired.

    The problem is in dst_gc, it would clean the cached route only
    when it's refcount is 1. If some other module (like xfrm) keeps
    holding it and the module only release it when dst_ops->check()
    fails.

    But without checking for the cached route expiration, .check()
    may always return true. Meanwhile, without releasing the cached
    route, dst_gc couldn't del it. It will cause this cached route
    never to expire.

    This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
    when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
    in .check.

    Note that this is even needed when ipv6 dst_gc timer is removed
    one day. It would set dst.obsolete in .redirect and .update_pmtu
    instead, and check for cached route expiration when getting it,
    just like what ipv4 route does.

    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Xin Long
     
  • Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
    generates a new sparse warning on rt->rt6i_node related code:
    net/ipv6/route.c:1394:30: error: incompatible types in comparison
    expression (different address spaces)
    ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
    expression (different address spaces)

    This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
    rcu API is used for it.
    After this fix, sparse no longer generates the above warning.

    Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     

23 Aug, 2017

1 commit

  • We currently keep rt->rt6i_node pointing to the fib6_node for the route.
    And some functions make use of this pointer to dereference the fib6_node
    from rt structure, e.g. rt6_check(). However, as there is neither
    refcount nor rcu taken when dereferencing rt->rt6i_node, it could
    potentially cause crashes as rt->rt6i_node could be set to NULL by other
    CPUs when doing a route deletion.
    This patch introduces an rcu grace period before freeing fib6_node and
    makes sure the functions that dereference it takes rcu_read_lock().

    Note: there is no "Fixes" tag because this bug was there in a very
    early stage.

    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     

22 Aug, 2017

1 commit


21 Aug, 2017

1 commit

  • In fib6_add(), it is possible that fib6_add_1() picks an intermediate
    node and sets the node's fn->leaf to NULL in order to add this new
    route. However, if fib6_add_rt2node() fails to add the new
    route for some reason, fn->leaf will be left as NULL and could
    potentially cause crash when fn->leaf is accessed in fib6_locate().
    This patch makes sure fib6_repair_tree() is called to properly repair
    fn->leaf in the above failure case.

    Here is the syzkaller reported general protection fault in fib6_locate:
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Modules linked in:
    CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
    RIP: 0010:[] [] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
    RIP: 0010:[] [] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
    RIP: 0010:[] [] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
    RIP: 0010:[] [] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
    RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202
    RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
    RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
    RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
    R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
    R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
    FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Stack:
    ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
    ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
    ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
    Call Trace:
    [] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
    [] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
    [] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
    [] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
    [] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
    [] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
    [] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
    [] sock_sendmsg_nosec net/socket.c:609 [inline]
    [] sock_sendmsg+0xcf/0x110 net/socket.c:619
    [] sock_write_iter+0x222/0x3a0 net/socket.c:834
    [] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
    [] __vfs_write+0xe4/0x110 fs/read_write.c:491
    [] vfs_write+0x178/0x4b0 fs/read_write.c:538
    [] SYSC_write fs/read_write.c:585 [inline]
    [] SyS_write+0xd9/0x1b0 fs/read_write.c:577
    [] entry_SYSCALL_64_fastpath+0x12/0x17

    Note: there is no "Fixes" tag as this seems to be a bug introduced
    very early.

    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Wang
     

19 Aug, 2017

1 commit

  • syzcaller reported the following use-after-free issue in rt6_select():
    BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
    BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
    Read of size 4 by task syz-executor1/439628
    CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
    ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
    ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
    Call Trace:
    [] __dump_stack lib/dump_stack.c:15 [inline]
    [] dump_stack+0xc1/0x124 lib/dump_stack.c:51
    sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
    Use struct sctp_sack_info instead
    [] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
    [] print_address_description mm/kasan/report.c:196 [inline]
    [] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
    [] kasan_report mm/kasan/report.c:305 [inline]
    [] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
    [] rt6_select net/ipv6/route.c:755 [inline]
    [] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
    [] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
    [] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
    [] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
    [] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
    [] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
    [] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
    [] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
    [] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
    [] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
    [] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
    [] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
    [] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
    [] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
    [] SyS_connect+0x29/0x30 net/socket.c:1563
    [] entry_SYSCALL_64_fastpath+0x12/0x17
    Object at ffff8800bc699380, in cache ip6_dst_cache size: 384

    The root cause of it is that in fib6_add_rt2node(), when it replaces an
    existing route with the new one, it does not update fn->rr_ptr.
    This commit resets fn->rr_ptr to NULL when it points to a route which is
    replaced in fib6_add_rt2node().

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Wang
     

10 Aug, 2017

1 commit

  • This change allows us to later indicate to rtnetlink core that certain
    doit functions should be called without acquiring rtnl_mutex.

    This change should have no effect, we simply replace the last (now
    unused) calcit argument with the new flag.

    Signed-off-by: Florian Westphal
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

04 Aug, 2017

6 commits

  • Similar to commit 1c677b3d2828 ("ipv4: fib: Add fib_info_hold() helper")
    and commit b423cb10807b ("ipv4: fib: Export free_fib_info()") add an
    helper to hold a reference on rt6_info and export rt6_release() to drop
    it and potentially release the route.

    This is needed so that drivers capable of FIB offload could hold a
    reference on the route before queueing it for offload and drop it after
    the route has been programmed to the device's tables.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a route is deleted its node pointer is set to NULL to indicate it's
    no longer linked to its node. Do the same for routes that are replaced.

    This will later allow us to test if a route is still in the FIB by
    checking its node pointer instead of its reference count.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • The code currently assumes that only FIB nodes can hold a reference on
    routes. Therefore, after fib6_purge_rt() has run and the route is no
    longer present in any intermediate nodes, it's assumed that its
    reference count would be 1 - taken by the node where it's currently
    stored.

    However, we're going to allow users other than the FIB to take a
    reference on a route, so this assumption is no longer valid and the
    BUG_ON() needs to be removed.

    Note that purging only takes place if the initial reference count is
    different than 1. I've left that check intact, as in the majority of
    systems (where routes are only referenced by the FIB), it does actually
    mean the route is present in intermediate nodes.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Dump all the FIB tables in each net namespace upon registration to the
    FIB notification chain so that the callee will have a complete view of
    the tables.

    The integrity of the dump is ensured by a per-table sequence counter
    that is incremented (under write lock) whenever a route is added or
    deleted from the table.

    All the sequence counters are read (under each table's read lock) and
    summed, prior and after the dump. In case the counters differ, then the
    dump is either restarted or the registration fails.

    While it's possible for a table to be modified after its counter has
    been read, this isn't really a problem. In case it happened before it
    was read the second time, then the comparison at the end will fail. If
    it happened afterwards, then we're guaranteed to be notified about the
    change, as the notification block is registered prior to the second
    read.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • As with IPv4, allow listeners of the FIB notification chain to receive
    notifications whenever a route is added, replaced or deleted. This is
    done by placing calls to the FIB notification chain in the two lowest
    level functions that end up performing these operations - namely,
    fib6_add_rt2node() and fib6_del_route().

    Unlike IPv4, APPEND notifications aren't sent as the kernel doesn't
    distinguish between "append" (NLM_F_CREATE|NLM_F_APPEND) and "prepend"
    (NLM_F_CREATE). If NLM_F_EXCL isn't set, duplicate routes are always
    added after the existing duplicate routes.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • We're about to add IPv6 FIB offload support, so implement the necessary
    callbacks in IPv6 code, which will later allow us to add routes and
    rules notifications.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

06 Jul, 2017

1 commit

  • Lennert reported a failure to add different mpls encaps in a multipath
    route:

    $ ip -6 route add 1234::/16 \
    nexthop encap mpls 10 via fe80::1 dev ens3 \
    nexthop encap mpls 20 via fe80::1 dev ens3
    RTNETLINK answers: File exists

    The problem is that the duplicate nexthop detection does not compare
    lwtunnel configuration. Add it.

    Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
    Signed-off-by: David Ahern
    Reported-by: João Taveira Araújo
    Reported-by: Lennert Buytenhek
    Acked-by: Roopa Prabhu
    Tested-by: Lennert Buytenhek
    Signed-off-by: David S. Miller

    David Ahern
     

22 Jun, 2017

1 commit


21 Jun, 2017

1 commit

  • While commit 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
    does good job on error propagation to the fib_rules_lookup()
    in fib rules core framework that also corrects throw routes
    handling, it does not solve route reference leakage problem
    happened when we return -EAGAIN to the fib_rules_lookup()
    and leave routing table entry referenced in arg->result.

    If rule with matched throw route isn't last matched in the
    list we overwrite arg->result losing reference on throw
    route stored previously forever.

    We also partially revert commit ab997ad40839 ("ipv6: fix the
    incorrect return value of throw route") since we never return
    routing table entry with dst.error == -EAGAIN when
    CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point
    to check for RTF_REJECT flag since it is always set throw
    route.

    Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
    Signed-off-by: Serhey Popovych
    Signed-off-by: David S. Miller

    Serhey Popovych
     

18 Jun, 2017

5 commits

  • DST_NOCACHE flag check has been removed from dst_release() and
    dst_hold_safe() in a previous patch because all the dst are now ref
    counted properly and can be released based on refcnt only.
    Looking at the rest of the DST_NOCACHE use, all of them can now be
    removed or replaced with other checks.
    So this patch gets rid of all the DST_NOCACHE usage and remove this flag
    completely.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     
  • icmp6 dst route is currently ref counted during creation and will be
    freed by user during its call of dst_release(). So no need of a garbage
    collector for it.
    Remove all icmp6 dst garbage collector related code.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     
  • With the previous preparation patches, we are ready to get rid of the
    dst gc operation in ipv6 code and release dst based on refcnt only.
    So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls
    to dst_free() and its related functions.
    At this point, all dst created in ipv6 code do not use the dst gc
    anymore and will be destroyed at the point when refcnt drops to 0.

    Also, as icmp6 dst route is refcounted during creation and will be freed
    by user during its call of dst_release(), there is no need to add this
    dst to the icmp6 gc list as well.
    Instead, we need to add it into uncached list so that when a
    NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through
    these icmp6 dst as well and release the net device properly.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     
  • As the intend of this patch series is to completely remove dst gc,
    we need to call dst_dev_put() to release the reference to dst->dev
    when removing routes from fib because we won't keep the gc list anymore
    and will lose the dst pointer right after removing the routes.
    Without the gc list, there is no way to find all the dst's that have
    dst->dev pointing to the going-down dev.
    Hence, we are doing dst_dev_put() immediately before we lose the last
    reference of the dst from the routing code. The next dst_check() will
    trigger a route re-lookup to find another route (if there is any).

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     
  • In IPv6 routing code, struct rt6_info is created for each static route
    and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
    ref count is not taken.
    As explained in the previous patch, this leads to the need of the dst
    garbage collector.

    This patch holds ref count of dst before inserting the route into fib6
    tree and properly releases the dst when deleting it from the fib6 tree
    as a preparation in order to fully get rid of dst gc later.

    Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
    no user is referencing the dst.

    And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
    dst->__refcnt to 1.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     

23 May, 2017

2 commits


14 Mar, 2017

1 commit

  • Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a
    loop that removes all siblings of an ECMP route that is being
    replaced. However, this loop doesn't stop when it has replaced
    siblings, and keeps removing other routes with a higher metric.
    We also end up triggering the WARN_ON after the loop, because after
    this nsiblings < 0.

    Instead, stop the loop when we have taken care of all routes with the
    same metric as the route being replaced.

    Reproducer:
    ===========
    #!/bin/sh

    ip netns add ns1
    ip netns add ns2
    ip -net ns1 link set lo up

    for x in 0 1 2 ; do
    ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
    ip -net ns1 link set eth$x up
    ip -net ns2 link set veth$x up
    done

    ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
    nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
    ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
    ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048

    echo "before replace, 3 routes"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
    echo

    ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
    nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2

    echo "after replace, only 2 routes, metric 2048 is gone"
    ip -net ns1 -6 r | grep -v '^fe80\|^ff00'

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Sabrina Dubroca
    Acked-by: Nicolas Dichtel
    Reviewed-by: Xin Long
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

05 Feb, 2017

3 commits

  • If an entire multipath route is deleted using prefix and len (without any
    nexthops), send a single RTM_DELROUTE notification with the full route
    using RTA_MULTIPATH. This is done by generating the skb before the route
    delete when all of the sibling routes are still present but sending it
    after the route has been removed from the FIB. The skip_notify flag
    is used to tell the lower fib code not to send notifications for the
    individual nexthop routes.

    If a route is deleted using RTA_MULTIPATH for any nexthops or a single
    nexthop entry is deleted, then the nexthops are deleted one at a time with
    notifications sent as each hop is deleted. This is necessary given that
    IPv6 allows individual hops within a route to be deleted.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Change ip6_route_multipath_add to send one notifciation with the full
    route encoded with RTA_MULTIPATH instead of a series of individual routes.
    This is done by adding a skip_notify flag to the nl_info struct. The
    flag is used to skip sending of the notification in the fib code that
    actually inserts the route. Once the full route has been added, a
    notification is generated with all nexthops.

    ip6_route_multipath_add handles 3 use cases: new routes, route replace,
    and route append. The multipath notification generated needs to be
    consistent with the order of the nexthops and it should be consistent
    with the order in a FIB dump which means the route with the first nexthop
    needs to be used as the route reference. For the first 2 cases (new and
    replace), a reference to the route used to send the notification is
    obtained by saving the first route added. For the append case, the last
    route added is used to loop back to its first sibling route which is
    the first nexthop in the multipath route.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • IPv6 returns multipath routes as a series of individual routes making
    their display and handling by userspace different and more complicated
    than IPv4, putting the burden on the user to see that a route is part of
    a multipath route and internally creating a multipath route if desired
    (e.g., libnl does this as of commit 29b71371e764). This patch addresses
    this difference, allowing multipath routes to be returned using the
    RTA_MULTIPATH attribute.

    The end result is that IPv6 multipath routes can be treated and displayed
    in a format similar to IPv4:

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium
    2001:db8:200::/120 metric 1024
    nexthop via 2001:db8:1::2 dev eth1 weight 1
    nexthop via 2001:db8:2::2 dev eth2 weight 1

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

02 Feb, 2017

1 commit


10 Sep, 2016

1 commit

  • Since commit 37a1d3611c12 ("ipv6: include NLM_F_REPLACE in route
    replace notifications"), RTM_NEWROUTE notifications have their
    NLM_F_REPLACE flag set if the new route replaced a preexisting one.
    However, other flags aren't set.

    This patch reports the missing NLM_F_CREATE and NLM_F_EXCL flag bits.

    NLM_F_APPEND is not reported, because in ipv6 a NLM_F_CREATE request
    is interpreted as an append request (contrary to ipv4, "prepend" is not
    supported, so if NLM_F_EXCL is not set then NLM_F_APPEND is implicit).

    As a result, the possible flag combination can now be reported
    (iproute2's terminology into parentheses):

    * NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
    ("add").
    * NLM_F_CREATE: route did already exist, new route added after
    preexisting ones ("append").
    * NLM_F_REPLACE: route did already exist, new route replaced the
    first preexisting one ("change").

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

06 Jul, 2016

1 commit

  • It was first reported and reproduced by Petr (thanks!) in
    https://bugzilla.kernel.org/show_bug.cgi?id=119581

    free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

    However, after fixing a deadlock bug in
    commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
    free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

    It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

    kmemleak somehow did not report it. We nailed it down by
    observing the pcpu entries in /proc/vmallocinfo (first suggested
    by Hannes, thanks!).

    Signed-off-by: Martin KaFai Lau
    Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
    Reported-by: Petr Novopashenniy
    Tested-by: Petr Novopashenniy
    Acked-by: Hannes Frederic Sowa
    Cc: Hannes Frederic Sowa
    Cc: Petr Novopashenniy
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

07 May, 2016

1 commit


09 Mar, 2016

3 commits

  • One of our customers observed issues with FIB6 garbage collectors
    running in different network namespaces blocking each other, resulting
    in soft lockups (fib6_run_gc() initiated from timer runs always in
    forced mode).

    Now that FIB6 walkers are separated per namespace, there is no more need
    for instances of fib6_run_gc() in different namespaces blocking each
    other. There is still a call to icmp6_dst_gc() which operates on shared
    data but this function is protected by its own shared lock.

    Signed-off-by: Michal Kubecek
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Michal Kubeček
     
  • The IPv6 FIB data structures are separated per network namespace but
    there is still only one global walkers list and one global walker list
    lock. This means changes in one namespace unnecessarily interfere with
    walkers in other namespaces.

    Replace the global list with per-netns lists (and give each its own
    lock).

    Signed-off-by: Michal Kubecek
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Michal Kubeček
     
  • Global variable gc_args is only used in fib6_run_gc() and functions
    called from it. As fib6_run_gc() makes sure there is at most one
    instance of fib6_clean_all() running at any moment, we can replace
    gc_args with a local variable which will be needed once multiple
    instances (per netns) of garbage collector are allowed.

    Signed-off-by: Michal Kubecek
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Michal Kubeček
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Oct, 2015

1 commit

  • The error condition -EAGAIN, which is signaled by throw routes, tells
    the rules framework to walk on searching for next matches. If the walk
    ends and we stop walking the rules with the result of a throw route we
    have to translate the error conditions to -ENETUNREACH.

    Signed-off-by: Xin Long
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    lucien
     

13 Oct, 2015

1 commit