28 Oct, 2016

1 commit

  • Similar to IPv4, do not consider link state when validating next hops.

    Currently, if the link is down default routes can fail to insert:
    $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
    RTNETLINK answers: No route to host

    With this patch the command succeeds.

    Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

19 Sep, 2016

1 commit


18 Jun, 2016

1 commit

  • VRF driver needs access to ip6_route_get_saddr code. Since it does
    little beyond ipv6_dev_get_saddr and ipv6_dev_get_saddr is already
    exported for modules move ip6_route_get_saddr to the header as an
    inline.

    Code move only; no functional change.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

16 Jun, 2016

1 commit

  • IPv6 multicast and link-local addresses require special handling by the
    VRF driver:
    1. Rather than using the VRF device index and full FIB lookups,
    packets to/from these addresses should use direct FIB lookups based on
    the VRF device table.

    2. fail sends/receives on a VRF device to/from a multicast address
    (e.g, make ping6 ff02::1% fail)

    3. move the setting of the flow oif to the first dst lookup and revert
    the change in icmpv6_echo_reply made in ca254490c8dfd ("net: Add VRF
    support to IPv6 stack"). Linklocal/mcast addresses require use of the
    skb->dev.

    With this change connections into and out of a VRF enslaved device work
    for multicast and link-local addresses work (icmp, tcp, and udp)
    e.g.,

    1. packets into VM with VRF config:
    ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
    ping6 -c3 ff02::1%br1

    ssh -6 fe80::e0:f9ff:fe1c:b974%br1

    2. packets going out a VRF enslaved device:
    ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
    ping6 -c3 ff02::1%eth1
    ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

12 Apr, 2016

1 commit

  • Vivek reported a kernel exception deleting a VRF with an active
    connection through it. The root cause is that the socket has a cached
    reference to a dst that is destroyed. Converting the dst_destroy to
    dst_release and letting proper reference counting kick in does not
    work as the dst has a reference to the device which needs to be released
    as well.

    I talked to Hannes about this at netdev and he pointed out the ipv4 and
    ipv6 dst handling has dst_ifdown for just this scenario. Rather than
    continuing with the reinvented dst wheel in VRF just remove it and
    leverage the ipv4 and ipv6 versions.

    Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
    Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

30 Jan, 2016

1 commit

  • The current implementation of ip6_dst_lookup_tail basically
    ignore the egress ifindex match: if the saddr is set,
    ip6_route_output() purposefully ignores flowi6_oif, due
    to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
    flag if saddr set"), if the saddr is 'any' the first route lookup
    in ip6_dst_lookup_tail fails, but upon failure a second lookup will
    be performed with saddr set, thus ignoring the ifindex constraint.

    This commit adds an output route lookup function variant, which
    allows the caller to specify lookup flags, and modify
    ip6_dst_lookup_tail() to enforce the ifindex match on the second
    lookup via said helper.

    ip6_route_output() becames now a static inline function build on
    top of ip6_route_output_flags(); as a side effect, out-of-tree
    modules need now a GPL license to access the output route lookup
    functionality.

    Signed-off-by: Paolo Abeni
    Acked-by: Hannes Frederic Sowa
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Paolo Abeni
     

04 Dec, 2015

1 commit

  • While testing the np->opt RCU conversion, I found that UDP/IPv6 was
    using a mixture of xchg() and sk_dst_lock to protect concurrent changes
    to sk->sk_dst_cache, leading to possible corruptions and crashes.

    ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
    way to fix the mess is to remove sk_dst_lock completely, as we did for
    IPv4.

    __ip6_dst_store() and ip6_dst_store() share same implementation.

    sk_setup_caps() being called with socket lock being held or not,
    we have to use sk_dst_set() instead of __sk_dst_set()

    Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
    in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

    This is because ip6_dst_store() can be called from process context,
    without any lock held.

    As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
    from another cpu doing a concurrent ip6_dst_store()

    Doing the dst dereference before doing the install is needed to make
    sure no use after free would trigger.

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Sep, 2015

1 commit


26 May, 2015

3 commits

  • Instead of doing the rt6->rt6i_node check whenever we need
    to get the route's cookie. Refactor it into rt6_get_cookie().
    It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
    percpu rt6_info later.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • This patch creates a RTF_CACHE routes only after encountering a pmtu
    exception.

    After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
    tree, the rt->rt6i_node->fn_sernum is bumped which will fail the
    ip6_dst_check() and trigger a relookup.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
    Also, rt6i_gateway is always set to the nexthop while the nexthop
    could be a gateway or the rt6i_dst.addr.

    After removing the rt6i_dst and rt6i_src dependency in the last patch,
    we also need to stop the caller from depending on rt6i_gateway and
    RTF_ANYCAST.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

07 Apr, 2015

1 commit

  • We should not consult skb->sk for output decisions in xmit recursion
    levels > 0 in the stack. Otherwise local socket settings could influence
    the result of e.g. tunnel encapsulation process.

    ipv6 does not conform with this in three places:

    1) ip6_fragment: we do consult ipv6_npinfo for frag_size

    2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
    loop the packet back to the local socket

    3) ip6_skb_dst_mtu could query the settings from the user socket and
    force a wrong MTU

    Furthermore:
    In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
    PF_PACKET socket ontop of an IPv6-backed vxlan device.

    Reuse xmit_recursion as we are currently only interested in protecting
    tunnel devices.

    Cc: Jiri Pirko
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    hannes@stressinduktion.org
     

24 May, 2014

1 commit

  • Conflicts:
    drivers/net/bonding/bond_alb.c
    drivers/net/ethernet/altera/altera_msgdma.c
    drivers/net/ethernet/altera/altera_sgdma.c
    net/ipv6/xfrm6_output.c

    Several cases of overlapping changes.

    The xfrm6_output.c has a bug fix which overlaps the renaming
    of skb->local_df to skb->ignore_df.

    In the Altera TSE driver cases, the register access cleanups
    in net-next overlapped with bug fixes done in net.

    Similarly a bug fix to send ALB packets in the bonding driver using
    the right source address overlaps with cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 May, 2014

1 commit

  • RFC 4861 states in 7.2.5:

    The IsRouter flag in the cache entry MUST be set based on the
    Router flag in the received advertisement. In those cases
    where the IsRouter flag changes from TRUE to FALSE as a result
    of this update, the node MUST remove that router from the
    Default Router List and update the Destination Cache entries
    for all destinations using that neighbor as a router as
    specified in Section 7.3.3. This is needed to detect when a
    node that is used as a router stops forwarding packets due to
    being configured as a host.

    Currently, when dealing with NA Message which IsRouter flag changes from
    TRUE to FALSE, the kernel only removes router from the Default Router List,
    and don't update the Destination Cache entries.

    Now in order to update those Destination Cache entries, i introduce
    function rt6_clean_tohost().

    Signed-off-by: Duan Jiong
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Duan Jiong
     

13 May, 2014

1 commit

  • As suggested by several people, rename local_df to ignore_df,
    since it means "ignore df bit if it is set".

    Cc: Maciej Żenczykowski
    Cc: Florian Westphal
    Cc: David S. Miller
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Acked-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    WANG Cong
     

15 Apr, 2014

1 commit

  • Francois reported that setting big mtu on loopback device could prevent
    tcp sessions making progress.

    We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.

    We must limit the IPv6 MTU to (65535 + 40) bytes in theory.

    Tested:

    ifconfig lo mtu 70000
    netperf -H ::1

    Before patch : Throughput : 0.05 Mbits

    After patch : Throughput : 35484 Mbits

    Reported-by: Francois WELLENREITER
    Signed-off-by: Eric Dumazet
    Acked-by: YOSHIFUJI Hideaki
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Apr, 2014

1 commit


27 Feb, 2014

1 commit

  • This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
    got recently introduced. It doesn't honor the path mtu discovered by the
    host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
    fragments if the packet size exceeds the MTU of the outgoing interface
    MTU.

    Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
    Cc: Florian Weimer
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

08 Jan, 2014

1 commit

  • This change allows to follow a recommandation of RFC4942.

    - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
    as source addresses for ICMPv6 echo reply. This sysctl is false by default
    to preserve existing behavior.
    - Add inline check ipv6_anycast_destination().
    - Use them in icmpv6_echo_reply().

    Reference:
    RFC4942 - IPv6 Transition/Coexistence Security Considerations
    (http://tools.ietf.org/html/rfc4942#section-2.1.6)

    2.1.6. Anycast Traffic Identification and Security

    [...]
    To avoid exposing knowledge about the internal structure of the
    network, it is recommended that anycast servers now take advantage of
    the ability to return responses with the anycast address as the
    source address if possible.

    Signed-off-by: Francois-Xavier Le Bail
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    FX Le Bail
     

02 Jan, 2014

1 commit

  • Running 'make namespacecheck' shows:
    net/ipv6/route.o
    ipv6_route_table_template
    rt6_bind_peer
    net/ipv6/icmp.o
    icmpv6_route_lookup
    ipv6_icmp_table_template

    This addresses some of those warnings by:
    * make icmpv6_route_lookup static
    * move inline's out of ip6_route.h since only used into route.c
    * move rt6_bind_peer into route.c

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

19 Dec, 2013

1 commit

  • IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
    nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
    information if this mode is enabled.

    The additional bit in ipv6_pinfo just eats in the padding behind the
    bitfield. There are no changes to the layout of the struct at all.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

24 Oct, 2013

1 commit


22 Oct, 2013

2 commits

  • Make sure rt6i_gateway contains nexthop information in
    all routes returned from lookup or when routes are directly
    attached to skb for generated ICMP packets.

    The effect of this patch should be a faster version of
    rt6_nexthop() and the consideration of local addresses as
    nexthop.

    Signed-off-by: Julian Anastasov
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
    ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
    such that the recently introduced rt6_nexthop() is used
    instead of an assigned neighbor.

    As rt6_nexthop() prefers rt6i_gateway only for gatewayed
    routes this causes a problem for users like IPVS, xt_TEE and
    RAW(hdrincl) if they want to use different address for routing
    compared to the destination address.

    Another case is when redirect can create RTF_DYNAMIC
    route without RTF_GATEWAY flag, we ignore the rt6i_gateway
    in rt6_nexthop().

    Fix the above problems by considering the rt6i_gateway if
    present, so that traffic routed to address on local subnet is
    not wrongly diverted to the destination address.

    Thanks to Simon Horman and Phil Oester for spotting the
    problematic commit.

    Thanks to Hannes Frederic Sowa for his review and help in testing.

    Reported-by: Phil Oester
    Reported-by: Mark Brooks
    Signed-off-by: Julian Anastasov
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Julian Anastasov
     

22 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

01 Sep, 2013

1 commit


23 Aug, 2013

1 commit


19 Jan, 2013

1 commit


18 Jan, 2013

1 commit


16 Nov, 2012

1 commit

  • The kernel uses some default metric when routes are managed. For example, a
    static route added with a metric set to 0 is inserted in the kernel with
    metric 1024 (IP6_RT_PRIO_USER).
    It is useful for routing daemons to know these values, to be able to set routes
    without interfering with what the kernel does.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

18 Jul, 2012

1 commit

  • We should provide to inet6_csk_route_socket a struct flowi6 pointer,
    so that net6_csk_xmit() works correctly instead of sending garbage.

    Also add some consts

    Signed-off-by: Eric Dumazet
    Reported-by: Yuchung Cheng
    Cc: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Jul, 2012

3 commits


16 Jun, 2012

1 commit

  • One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
    to handle the error pass the 32-bit info cookie in network byte order
    whereas ipv4 passes it around in host byte order.

    Like the ipv4 side, we have two helper functions. One for when we
    have a socket context and one for when we do not.

    ip6ip6 tunnels are not handled here, because they handle PMTU events
    by essentially relaying another ICMP packet-too-big message back to
    the original sender.

    This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all
    kinds of situations that simply cannot happen when we do the PMTU
    update directly using a fully resolved route.

    In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
    likely be removed or changed into a BUG_ON() check. We should never
    have a prefixed ipv6 route when we get there.

    Another piece of strange history here is that TCP and DCCP, unlike in
    ipv4, never invoke the update_pmtu() method from their ICMP error
    handlers. This is incredibly astonishing since this is the context
    where we have the most accurate context in which to make a PMTU
    update, namely we have a fully connected socket and associated cached
    socket route.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Jun, 2012

1 commit


11 Jun, 2012

1 commit

  • We encode the pointer(s) into an unsigned long with one state bit.

    The state bit is used so we can store the inetpeer tree root to use
    when resolving the peer later.

    Later the peer roots will be per-FIB table, and this change works to
    facilitate that.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Jun, 2012

1 commit

  • There's a lot of places that open-code rt{,6}_get_peer() only because
    they want to set 'create' to one. So add an rt{,6}_get_peer_create()
    for their sake.

    There were also a few spots open-coding plain rt{,6}_get_peer() and
    those are transformed here as well.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 May, 2012

1 commit