23 Nov, 2018

1 commit

  • [ Upstream commit 7ddacfa564870cdd97275fd87decb6174abc6380 ]

    Preethi reported that PMTU discovery for UDP/raw applications is not
    working in the presence of VRF when the socket is not bound to a device.
    The problem is that ip6_sk_update_pmtu does not consider the L3 domain
    of the skb device if the socket is not bound. Update the function to
    set oif to the L3 master device if relevant.

    Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack")
    Reported-by: Preethi Ramachandra
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

26 Sep, 2018

1 commit

  • commit f7225172f25aaf0dfd9ad65f05be8da5d6108b12 upstream.

    syzbot reported a use-after-free:

    BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
    Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555

    CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
    ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
    ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    Allocated by task 4555:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    dst_alloc+0xbb/0x1d0 net/core/dst.c:104
    __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
    ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
    ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
    ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    Freed by task 4555:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    dst_destroy+0x267/0x3c0 net/core/dst.c:140
    dst_release_immediate+0x71/0x9e net/core/dst.c:205
    fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
    __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
    ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    The problem is that rt_last can point to a deleted route if the insert
    fails.

    One reproducer is to insert a route and then add a multipath route that
    has a duplicate nexthop.e.g,:
    $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
    $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2

    Fix by not setting rt_last until the it is verified the insert succeeded.

    Backport Note:
    - Upstream has replaced rt6_info usage with fib6_info in 8d1c802b281
    ("net/ipv6: Flip FIB entries to fib6_info")
    - fib6_info_release was introduced upstream in 93531c674315
    ("net/ipv6: separate handling of FIB entries from dst based routes"),
    but is not present in stable kernels; 4.14.y relies on dst_release/
    ip6_rt_put/dst_release_immediate.

    Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
    Cc: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David Ahern
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Zubin Mithra
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

26 Jun, 2018

1 commit

  • [ Upstream commit 0975764684487bf3f7a47eef009e750ea41bd514 ]

    IPVS setups with local client and remote tunnel server need
    to create exception for the local virtual IP. What we do is to
    change PMTU from 64KB (on "lo") to 1460 in the common case.

    Suggested-by: Martin KaFai Lau
    Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
    Fixes: 7343ff31ebf0 ("ipv6: Don't create clones of host routes.")
    Signed-off-by: Julian Anastasov
    Acked-by: David Ahern
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Julian Anastasov
     

12 Jun, 2018

1 commit

  • [ Upstream commit fa1be7e01ea863e911349e30456706749518eeab ]

    Some of the code paths calculating flow hash for IPv6 use flowlabel member
    of struct flowi6 which, despite its name, encodes both flow label and
    traffic class. If traffic class changes within a TCP connection (as e.g.
    ssh does), ECMP route can switch between path. It's also inconsistent with
    other code paths where ip6_flowlabel() (returning only flow label) is used
    to feed the key.

    Use only flow label everywhere, including one place where hash key is set
    using ip6_flowinfo().

    Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
    Fixes: f70ea018da06 ("net: Add functions to get skb->hash based on flow structures")
    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Michal Kubecek
     

19 May, 2018

1 commit

  • [ Upstream commit cea67a2dd6b2419dcc13a39309b9a79a1f773193 ]

    syzbot/KMSAN reported an uninit-value in ip6_multipath_l3_keys(),
    root caused to a bad assumption of ICMP header being already
    pulled in skb->head

    ip_multipath_l3_keys() does the correct thing, so it is an IPv6 only bug.

    BUG: KMSAN: uninit-value in ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline]
    BUG: KMSAN: uninit-value in rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858
    CPU: 0 PID: 4507 Comm: syz-executor661 Not tainted 4.16.0+ #87
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
    ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline]
    rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858
    ip6_route_input+0x65a/0x920 net/ipv6/route.c:1884
    ip6_rcv_finish+0x413/0x6e0 net/ipv6/ip6_input.c:69
    NF_HOOK include/linux/netfilter.h:288 [inline]
    ipv6_rcv+0x1e16/0x2340 net/ipv6/ip6_input.c:208
    __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562
    __netif_receive_skb net/core/dev.c:4627 [inline]
    netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
    netif_receive_skb+0x230/0x240 net/core/dev.c:4725
    tun_rx_batched drivers/net/tun.c:1555 [inline]
    tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962
    tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
    call_write_iter include/linux/fs.h:1782 [inline]
    new_sync_write fs/read_write.c:469 [inline]
    __vfs_write+0x7fb/0x9f0 fs/read_write.c:482
    vfs_write+0x463/0x8d0 fs/read_write.c:544
    SYSC_write+0x172/0x360 fs/read_write.c:589
    SyS_write+0x55/0x80 fs/read_write.c:581
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Fixes: 23aebdacb05d ("ipv6: Compute multipath hash for ICMP errors from offending packet")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Jakub Sitnicki
    Acked-by: Jakub Sitnicki
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

29 Apr, 2018

1 commit

  • [ Upstream commit aa8f8778493c85fff480cdf8b349b1e1dcb5f243 ]

    KMSAN reported use of uninit-value that I tracked to lack
    of proper size check on RTA_TABLE attribute.

    I also believe RTA_PREFSRC lacks a similar check.

    Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config")
    Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

12 Apr, 2018

1 commit

  • [ Upstream commit b6cdbc85234b072340b8923e69f49ec293f905dc ]

    Donald reported that IPv6 route leaking between VRFs is not working.
    The root cause is the strict argument in the call to rt6_lookup when
    validating the nexthop spec.

    ip6_route_check_nh validates the gateway and device (if given) of a
    route spec. It in turn could call rt6_lookup (e.g., lookup in a given
    table did not succeed so it falls back to a full lookup) and if so
    sets the strict argument to 1. That means if the egress device is given,
    the route lookup needs to return a result with the same device. This
    strict requirement does not work with VRFs (IPv4 or IPv6) because the
    oif in the flow struct is overridden with the index of the VRF device
    to trigger a match on the l3mdev rule and force the lookup to its table.

    The right long term solution is to add an l3mdev index to the flow
    struct such that the oif is not overridden. That solution will not
    backport well, so this patch aims for a simpler solution to relax the
    strict argument if the route spec device is an l3mdev slave. As done
    in other places, use the FLOWI_FLAG_SKIP_NH_OIF to know that the
    RT6_LOOKUP_F_IFACE flag needs to be removed.

    Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack")
    Reported-by: Donald Sharp
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

03 Mar, 2018

1 commit

  • [ Upstream commit 588753f1eb18978512b1c9b85fddb457d46f9033 ]

    One example of when an ICMPv6 packet is required to be looped back is
    when a host acts as both a Multicast Listener and a Multicast Router.

    A Multicast Router will listen on address ff02::16 for MLDv2 messages.

    Currently, MLDv2 messages originating from a Multicast Listener running
    on the same host as the Multicast Router are not being delivered to the
    Multicast Router. This is due to dst.input being assigned the default
    value of dst_discard.

    This results in the packet being looped back but discarded before being
    delivered to the Multicast Router.

    This patch sets dst.input to ip6_input to ensure a looped back packet
    is delivered to the Multicast Router.

    Signed-off-by: Brendan McGrath
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Brendan McGrath
     

03 Jan, 2018

1 commit

  • [ Upstream commit 58acfd714e6b02e8617448b431c2b64a2f1f0792 ]

    Currently, parameters such as oif and source address are not taken into
    account during fibmatch lookup. Example (IPv4 for reference) before
    patch:

    $ ip -4 route show
    192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
    198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1

    $ ip -6 route show
    2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
    2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
    fe80::/64 dev dummy0 proto kernel metric 256 pref medium
    fe80::/64 dev dummy1 proto kernel metric 256 pref medium

    $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
    192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
    $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
    RTNETLINK answers: No route to host

    $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
    2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
    $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
    2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium

    After:

    $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
    2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
    $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
    RTNETLINK answers: Network is unreachable

    The problem stems from the fact that the necessary route lookup flags
    are not set based on these parameters.

    Instead of duplicating the same logic for fibmatch, we can simply
    resolve the original route from its copy and dump it instead.

    Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
    Signed-off-by: Ido Schimmel
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ido Schimmel
     

30 Dec, 2017

1 commit

  • This reverts commit 9704f8147e88213f2fa580f713b42b08a4f1a7d2 which was
    upstream commit a94b9367e044ba672c9f4105eb1516ff6ff4948a.

    Shouldn't have been here, sorry about that.

    Reported-by: Chris Rankin
    Reported-by: Willy Tarreau
    Cc: Ido Schimmel
    Cc: Ozgur
    Cc: Wei Wang
    Cc: Martin KaFai Lau
    Cc: Eric Dumazet
    Cc: David S. Miller
    Cc: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

25 Dec, 2017

1 commit

  • [ Upstream commit a94b9367e044ba672c9f4105eb1516ff6ff4948a ]

    After rwlock is replaced with rcu and spinlock, ip6_pol_route() will be
    called with only rcu held. That means rt6 route deletion could happen
    simultaneously with rt6_make_pcpu_rt(). This could potentially cause
    memory leak if rt6_release() is called right before rt6_make_pcpu_rt()
    on the same route.

    This patch grabs rt->rt6i_ref safely before calling rt6_make_pcpu_rt()
    to make sure rt6_release() will not get triggered while
    rt6_make_pcpu_rt() is in progress. And rt6_release() is called after
    rt6_make_pcpu_rt() is finished.

    Note: As we are incrementing rt->rt6i_ref in ip6_pol_route(), there is a
    very slim chance that fib6_purge_rt() will be triggered unnecessarily
    when deleting a route if ip6_pol_route() running on another thread picks
    this route as well and tries to make pcpu cache for it.

    Signed-off-by: Wei Wang
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Wei Wang
     

17 Dec, 2017

1 commit

  • [ Upstream commit 98d11291d189cb5adf49694d0ad1b971c0212697 ]

    Florian reported a breakage with anycast routes due to commit
    4832c30d5458 ("net: ipv6: put host and anycast routes on device with
    address"). Prior to this commit anycast routes were added against the
    loopback device causing repetitive route entries with no insight into
    why they existed. e.g.:
    $ ip -6 ro ls table local type anycast
    anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
    anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
    anycast fe80:: dev lo proto kernel metric 0 pref medium
    anycast fe80:: dev lo proto kernel metric 0 pref medium

    The point of commit 4832c30d5458 is to add the routes using the device
    with the address which is causing the route to be added. e.g.,:
    $ ip -6 ro ls table local type anycast
    anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
    anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
    anycast fe80:: dev eth2 proto kernel metric 0 pref medium
    anycast fe80:: dev eth1 proto kernel metric 0 pref medium

    For traffic to work as it did before, the dst device needs to be switched
    to the loopback when the copy is created similar to local routes.

    Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

10 Oct, 2017

1 commit

  • A recent patch removed the dst_free() on the allocated
    dst_entry in ipv6_blackhole_route(). The dst_free() marked
    the dst_entry as dead and added it to the gc list. I.e. it
    was setup for a one time usage. As a result we may now have
    a blackhole route cached at a socket on some IPsec scenarios.
    This makes the connection unusable.

    Fix this by marking the dst_entry directly at allocation time
    as 'dead', so it is used only once.

    Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of dst_free()")
    Reported-by: Tobias Brunner
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

02 Sep, 2017

1 commit


29 Aug, 2017

2 commits

  • Now it doesn't check for the cached route expiration in ipv6's
    dst_ops->check(), because it trusts dst_gc that would clean the
    cached route up when it's expired.

    The problem is in dst_gc, it would clean the cached route only
    when it's refcount is 1. If some other module (like xfrm) keeps
    holding it and the module only release it when dst_ops->check()
    fails.

    But without checking for the cached route expiration, .check()
    may always return true. Meanwhile, without releasing the cached
    route, dst_gc couldn't del it. It will cause this cached route
    never to expire.

    This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
    when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
    in .check.

    Note that this is even needed when ipv6 dst_gc timer is removed
    one day. It would set dst.obsolete in .redirect and .update_pmtu
    instead, and check for cached route expiration when getting it,
    just like what ipv4 route does.

    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Xin Long
     
  • Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
    generates a new sparse warning on rt->rt6i_node related code:
    net/ipv6/route.c:1394:30: error: incompatible types in comparison
    expression (different address spaces)
    ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
    expression (different address spaces)

    This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
    rcu API is used for it.
    After this fix, sparse no longer generates the above warning.

    Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     

26 Aug, 2017

1 commit


25 Aug, 2017

3 commits

  • Allow our callers to influence the choice of ECMP link by honoring the
    hash passed together with the flow info. This allows for special
    treatment of ICMP errors which we would like to route over the same path
    as the IPv6 datagram that triggered the error.

    Also go through rt6_multipath_hash(), in the usual case when we aren't
    dealing with an ICMP error, so that there is one central place where
    multipath hash is computed.

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller

    Jakub Sitnicki
     
  • Commit 644d0e656958 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
    turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
    sense to keep it around. Also remove the accompanying comment that has
    become outdated.

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller

    Jakub Sitnicki
     
  • When forwarding or sending out an ICMPv6 error, look at the embedded
    packet that triggered the error and compute a flow hash over its
    headers.

    This let's us route the ICMP error together with the flow it belongs to
    when multipath (ECMP) routing is in use, which in turn makes Path MTU
    Discovery work in ECMP load-balanced or anycast setups (RFC 7690).

    Granted, end-hosts behind the ECMP router (aka servers) need to reflect
    the IPv6 Flow Label for PMTUD to work.

    The code is organized to be in parallel with ipv4 stack:

    ip_multipath_l3_keys -> ip6_multipath_l3_keys
    fib_multipath_hash -> rt6_multipath_hash

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller

    Jakub Sitnicki
     

23 Aug, 2017

1 commit

  • We currently keep rt->rt6i_node pointing to the fib6_node for the route.
    And some functions make use of this pointer to dereference the fib6_node
    from rt structure, e.g. rt6_check(). However, as there is neither
    refcount nor rcu taken when dereferencing rt->rt6i_node, it could
    potentially cause crashes as rt->rt6i_node could be set to NULL by other
    CPUs when doing a route deletion.
    This patch introduces an rcu grace period before freeing fib6_node and
    makes sure the functions that dereference it takes rcu_read_lock().

    Note: there is no "Fixes" tag because this bug was there in a very
    early stage.

    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     

22 Aug, 2017

1 commit

  • One nagging difference between ipv4 and ipv6 is host routes for ipv6
    addresses are installed using the loopback device or VRF / L3 Master
    device. e.g.,

    2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium
    local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium

    Using the loopback device is convenient -- necessary for local tx, but
    has some nasty side effects, most notably setting the 'lo' device down
    causes all host routes for all local IPv6 address to be removed from the
    FIB and completely breaks IPv6 networking across all interfaces.

    This patch puts FIB entries for IPv6 routes against the device. This
    simplifies the routes in the FIB, for example by making dst->dev and
    rt6i_idev->dev the same (a future patch can look at removing the device
    reference taken for rt6i_idev for FIB entries).

    When copies are made on FIB lookups, the cloned route has dst->dev
    set to loopback (or the L3 master device). This is needed for the
    local Tx of packets to local addresses.

    With fib entries allocated against the real network device, the addrconf
    code that reinserts host routes on admin up of 'lo' is no longer needed.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

19 Aug, 2017

1 commit

  • Adding a lock around one of the assignments prevents gcc from
    tracking the state of the local 'fibmatch' variable, so it can no
    longer prove that 'dst' is always initialized, leading to a bogus
    warning:

    net/ipv6/route.c: In function 'inet6_rtm_getroute':
    net/ipv6/route.c:3659:2: error: 'dst' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    This moves the other assignment into the same lock to shut up the
    warning.

    Fixes: 121622dba8da ("ipv6: route: make rtm_getroute not assume rtnl is locked")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

16 Aug, 2017

5 commits

  • David S. Miller
     
  • Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • __dev_get_by_index assumes RTNL is held, use _rcu version instead.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Based on a syzkaller report [1], I found that a per cpu allocation
    failure in snmp6_alloc_dev() would then lead to NULL dereference in
    ip6_route_dev_notify().

    It seems this is a very old bug, thus no Fixes tag in this submission.

    Let's add in6_dev_put_clear() helper, as we will probably use
    it elsewhere (once available/present in net-next)

    [1]
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff88019f456680 task.stack: ffff8801c6e58000
    RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
    RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
    RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
    RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
    RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
    RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
    RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
    R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
    FS: 00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
    DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Call Trace:
    refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
    in6_dev_put include/net/addrconf.h:335 [inline]
    ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
    notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
    call_netdevice_notifiers net/core/dev.c:1694 [inline]
    rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
    rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
    register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
    register_netdev+0x1a/0x30 net/core/dev.c:7669
    loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
    ops_init+0x10a/0x570 net/core/net_namespace.c:118
    setup_net+0x313/0x710 net/core/net_namespace.c:294
    copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
    create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
    unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
    SYSC_unshare kernel/fork.c:2347 [inline]
    SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x4512c9
    RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
    RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
    RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
    R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
    Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
    f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 b6
    0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
    RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
    ffff8801c6e5f1b0
    RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
    ffff8801c6e5f1b0
    RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
    ffff8801c6e5f1b0
    ---[ end trace e441d046c6410d31 ]---

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • IPv6 routes currently lack nexthop flags as in IPv4. This has several
    implications.

    In the forwarding path, it requires us to check the carrier state of the
    nexthop device and potentially ignore a linkdown route, instead of
    checking for RTNH_F_LINKDOWN.

    It also requires capable drivers to use the user facing IPv6-specific
    route flags to provide offload indication, instead of using the nexthop
    flags as in IPv4.

    Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
    provide offload indication instead of the RTF_OFFLOAD flag, which is
    removed while it's still not part of any official kernel release.

    In the near future we would like to use the field for the
    RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
    not be ready in time for the current cycle.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

15 Aug, 2017

1 commit

  • When a dst is created by addrconf_dst_alloc() for a host route or an
    anycast route, dst->dev points to loopback dev while rt6->rt6i_idev
    points to a real device.
    When the real device goes down, the current cleanup code only checks for
    dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the
    refcount leak on the real device in the above situation.
    This patch makes sure to always release the refcount taken on
    rt6->rt6i_idev during dst_dev_put().

    Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of
    dst_free()")
    Reported-by: John Stultz
    Tested-by: John Stultz
    Tested-by: Martin KaFai Lau
    Signed-off-by: Wei Wang
    Signed-off-by: Martin KaFai Lau
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Wei Wang
     

10 Aug, 2017

2 commits

  • This change allows us to later indicate to rtnetlink core that certain
    doit functions should be called without acquiring rtnl_mutex.

    This change should have no effect, we simply replace the last (now
    unused) calcit argument with the new flag.

    Signed-off-by: Florian Westphal
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • The UDP offload conflict is dealt with by simply taking what is
    in net-next where we have removed all of the UFO handling code
    entirely.

    The TCP conflict was a case of local variables in a function
    being removed from both net and net-next.

    In netvsc we had an assignment right next to where a missing
    set of u64 stats sync object inits were added.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Aug, 2017

1 commit

  • If the user hasn't installed any custom rules, don't go through the
    whole FIB rules layer. This is pretty similar to f4530fa574df (ipv4:
    Avoid overhead when no custom FIB rules are installed).

    Using a micro-benchmark module [1], timing ip6_route_output() with
    get_cycles(), with 40,000 routes in the main routing table, before this
    patch:

    min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821
    table=254 avgdepth=21.8 maxdepth=39
    value │ ┊ count
    600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 199
    880 │▒▒▒░░░░░░░░░░░░░░░░ 43
    1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░ 48
    1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░ 43
    1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░ 59
    2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 50
    2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26
    2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 31
    2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 28
    3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17
    3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17
    3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8
    3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 11
    4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6
    4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6
    4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9

    After:

    min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565
    table=254 avgdepth=21.8 maxdepth=39
    value │ ┊ count
    540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 201
    800 │▒▒▒▒▒░░░░░░░░░░░░░░░░ 63
    1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░ 68
    1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░ 39
    1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32
    1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32
    2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 34
    2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 33
    2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26
    2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 22
    3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9
    3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8
    3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9
    3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8
    4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8
    4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8

    At the frequency of the host during the bench (~ 3.7 GHz), this is
    about a 100 ns difference on the median value.

    A next step would be to collapse local and main tables, as in
    0ddcf43d5d4a (ipv4: FIB Local/MAIN table collapse).

    [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.c

    Signed-off-by: Vincent Bernat
    Reviewed-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Vincent Bernat
     

04 Aug, 2017

2 commits

  • Allow user space applications to see which routes are offloaded and
    which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.

    To be consistent with IPv4, offload indication is provided on a
    per-nexthop basis.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • After commit c2ed1880fd61 ("net: ipv6: check route protocol when
    deleting routes"), ipv6 route checks rt protocol when trying to
    remove a rt entry.

    It introduced a side effect causing 'ip -6 route flush cache' not
    to work well. When flushing caches with iproute, all route caches
    get dumped from kernel then removed one by one by sending DELROUTE
    requests to kernel for each cache.

    The thing is iproute sends the request with the cache whose proto
    is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
    it. But in kernel the rt_cache protocol is still 0, which causes
    the cache not to be matched and removed.

    So the real reason is rt6i_protocol in the route is not set when
    it is allocated. As David Ahern's suggestion, this patch is to
    set rt6i_protocol properly in the route when it is installed and
    remove the codes setting rtm_protocol according to rt6i_flags in
    rt6_fill_node.

    This is also an improvement to keep rt6i_protocol consistent with
    rtm_protocol.

    Fixes: c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes")
    Reported-by: Jianlin Shi
    Suggested-by: David Ahern
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

06 Jul, 2017

1 commit

  • Lennert reported a failure to add different mpls encaps in a multipath
    route:

    $ ip -6 route add 1234::/16 \
    nexthop encap mpls 10 via fe80::1 dev ens3 \
    nexthop encap mpls 20 via fe80::1 dev ens3
    RTNETLINK answers: File exists

    The problem is that the duplicate nexthop detection does not compare
    lwtunnel configuration. Add it.

    Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
    Signed-off-by: David Ahern
    Reported-by: João Taveira Araújo
    Reported-by: Lennert Buytenhek
    Acked-by: Roopa Prabhu
    Tested-by: Lennert Buytenhek
    Signed-off-by: David S. Miller

    David Ahern
     

01 Jul, 2017

1 commit


22 Jun, 2017

1 commit

  • In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
    I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
    unfortunately, as reported by jeffy, netdev_wait_allrefs()
    could rebroadcast NETDEV_UNREGISTER event until all refs are
    gone.

    We have to add an additional check to avoid this corner case.
    For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
    for dev_change_net_namespace(), dev->reg_state is
    NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.

    Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
    Reported-by: jeffy
    Cc: David Ahern
    Signed-off-by: Cong Wang
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    WANG Cong
     

18 Jun, 2017

3 commits

  • DST_NOCACHE flag check has been removed from dst_release() and
    dst_hold_safe() in a previous patch because all the dst are now ref
    counted properly and can be released based on refcnt only.
    Looking at the rest of the DST_NOCACHE use, all of them can now be
    removed or replaced with other checks.
    So this patch gets rid of all the DST_NOCACHE usage and remove this flag
    completely.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     
  • Now that all the components have been changed to release dst based on
    refcnt only and not depend on dst gc anymore, we can remove the
    temporary flag DST_NOGC.

    Note that we also need to remove the DST_NOCACHE check in dst_release()
    and dst_hold_safe() because now all the dst are released based on refcnt
    and behaves as DST_NOCACHE.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang
     
  • icmp6 dst route is currently ref counted during creation and will be
    freed by user during its call of dst_release(). So no need of a garbage
    collector for it.
    Remove all icmp6 dst garbage collector related code.

    Signed-off-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Wei Wang