21 Apr, 2020

1 commit

  • [ Upstream commit 03e2a984b6165621f287fadf5f4b5cd8b58dcaba ]

    The behaviour for what is considered an anycast address changed in
    commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after
    encountering pmtu exception"). This now considers the first
    address in a subnet where there is a route via a gateway
    to be an anycast address.

    This breaks path MTU discovery and traceroutes when a host in a
    remote network uses the address at the start of a prefix
    (eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors
    will not be sent to anycast addresses.

    This patch excludes any routes with a gateway, or via point to
    point links, like the behaviour previously from
    rt6_is_gw_or_nonexthop in net/ipv6/route.c.

    This can be tested with:
    ip link add v1 type veth peer name v2
    ip netns add test
    ip netns exec test ip link set lo up
    ip link set v2 netns test
    ip link set v1 up
    ip netns exec test ip link set v2 up
    ip addr add 2001:db8::1/64 dev v1 nodad
    ip addr add 2001:db8:100:: dev lo nodad
    ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
    ip netns exec test ip route add unreachable 2001:db8:1::1
    ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1
    ip netns exec test sysctl net.ipv6.conf.all.forwarding=1
    ip route add 2001:db8:1::1 via 2001:db8::2
    ping -I 2001:db8::1 2001:db8:1::1 -c1
    ping -I 2001:db8:100:: 2001:db8:1::1 -c1
    ip addr delete 2001:db8:100:: dev lo
    ip netns delete test

    Currently the first ping will get back a destination unreachable ICMP
    error, but the second will never get a response, with "icmp6_send:
    acast source" logged. After this patch, both get destination
    unreachable ICMP replies.

    Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
    Signed-off-by: Tim Stallard
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tim Stallard
     

28 Jun, 2019

1 commit

  • The new route handling in ip_mc_finish_output() from 'net' overlapped
    with the new support for returning congestion notifications from BPF
    programs.

    In order to handle this I had to take the dev_loopback_xmit() calls
    out of the switch statement.

    The aquantia driver conflicts were simple overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Jun, 2019

1 commit

  • There is no functional change in this patch, it only prepares the next one.

    rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const
    variables.

    Signed-off-by: Nicolas Dichtel
    Reported-by: kbuild test robot
    Acked-by: Nick Desaulniers
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

25 Jun, 2019

1 commit

  • Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst
    cache"), route exceptions reside in a separate hash table, and won't be
    found by walking the FIB, so they won't be dumped to userspace on a
    RTM_GETROUTE message.

    This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to
    have no function anymore:

    # ip -6 route get fc00:3::1
    fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium
    # ip -6 route get fc00:4::1
    fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium
    # ip -6 route list cache
    # ip -6 route flush cache
    # ip -6 route get fc00:3::1
    fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium
    # ip -6 route get fc00:4::1
    fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium

    because iproute2 lists cached routes using RTM_GETROUTE, and flushes them
    by listing all the routes, and deleting them with RTM_DELROUTE one by one.

    If cached routes are requested using the RTM_F_CLONED flag together with
    strict checking, or if no strict checking is requested (and hence we can't
    consistently apply filters), look up exceptions in the hash table
    associated with the current fib6_info in rt6_dump_route(), and, if present
    and not expired, add them to the dump.

    We might be unable to dump all the entries for a given node in a single
    message, so keep track of how many entries were handled for the current
    node in fib6_walker, and skip that amount in case we start from the same
    partially dumped node.

    When a partial dump restarts, as the starting node might change when
    'sernum' changes, we have no guarantee that we need to skip the same
    amount of in-node entries. Therefore, we need two counters, and we need to
    zero the in-node counter if the node from which the dump is resumed
    differs.

    Note that, with the current version of iproute2, this only fixes the
    'ip -6 route list cache': on a flush command, iproute2 doesn't pass
    RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is
    still unable to fetch the routes to be flushed. This will be addressed in
    a patch for iproute2.

    To flush cached routes, a procfs entry could be introduced instead: that's
    how it works for IPv4. We already have a rt6_flush_exception() function
    ready to be wired to it. However, this would not solve the issue for
    listing.

    Versions of iproute2 and kernel tested:

    iproute2
    kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched
    3.18 list + + + + + +
    flush + + + + + +
    4.4 list + + + + + +
    flush + + + + + +
    4.9 list + + + + + +
    flush + + + + + +
    4.14 list + + + + + +
    flush + + + + + +
    4.15 list
    flush
    4.19 list
    flush
    5.0 list
    flush
    5.1 list
    flush
    with list + + + + + +
    fix flush + + + +

    v7:
    - Explain usage of "skip" counters in commit message (suggested by
    David Ahern)

    v6:
    - Rebase onto net-next, use recently introduced nexthop walker
    - Make rt6_nh_dump_exceptions() a separate function (suggested by David
    Ahern)

    v5:
    - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH,
    update test results (flushing works with iproute2 < 5.0.0 now)

    v4:
    - Split NLM_F_MATCH and strict check handling in separate patches
    - Filter routes using RTM_F_CLONED: if it's not set, only return
    non-cached routes, and if it's set, only return cached routes:
    change requested by David Ahern and Martin Lau. This implies that
    iproute2 needs a separate patch to be able to flush IPv6 cached
    routes. This is not ideal because we can't fix the breakage caused
    by 2b760fcf5cfb entirely in kernel. However, two years have passed
    since then, and this makes it more tolerable

    v3:
    - More descriptive comment about expired exceptions in rt6_dump_route()
    - Swap return values of rt6_dump_route() (suggested by Martin Lau)
    - Don't zero skip_in_node in case we don't dump anything in a given pass
    (also suggested by Martin Lau)
    - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic,
    it's just a flag to indicate the route was cloned, not to filter on
    routes

    v2: Add tracking of number of entries to be skipped in current node after
    a partial dump. As we restart from the same node, if not all the
    exceptions for a given node fit in a single message, the dump will
    not terminate, as suggested by Martin Lau. This is a concrete
    possibility, setting up a big number of exceptions for the same route
    actually causes the issue, suggested by David Ahern.

    Reported-by: Jianlin Shi
    Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

24 Jun, 2019

3 commits

  • For tx path, in most cases, we still have to take refcnt on the dst
    cause the caller is caching the dst somewhere. But it still is
    beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the
    route lookup. It is cause this flag prevents manipulating refcnt on
    net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each
    routing table. The null_entry is a shared object and constant updates on
    it cause false sharing.

    We converted the current major lookup function ip6_route_output_flags()
    to make use of RT6_LOOKUP_F_DST_NOREF.

    Together with the change in the rx path, we see noticable performance
    boost:
    I ran synflood tests between 2 hosts under the same switch. Both hosts
    have 20G mlx NIC, and 8 tx/rx queues.
    Sender sends pure SYN flood with random src IPs and ports using trafgen.
    Receiver has a simple TCP listener on the target port.
    Both hosts have multiple custom rules:
    - For incoming packets, only local table is traversed.
    - For outgoing packets, 3 tables are traversed to find the route.
    The packet processing rate on the receiver is as follows:
    - Before the fix: 3.78Mpps
    - After the fix: 5.50Mpps

    Signed-off-by: Wei Wang
    Signed-off-by: David S. Miller

    Wei Wang
     
  • This patch specifically converts the rule lookup logic to honor this
    flag and not release refcnt when traversing each rule and calling
    lookup() on each routing table.
    Similar to previous patch, we also need some special handling of dst
    entries in uncached list because there is always 1 refcnt taken for them
    even if RT6_LOOKUP_F_DST_NOREF flag is set.

    Signed-off-by: Wei Wang
    Signed-off-by: David S. Miller

    Wei Wang
     
  • This new flag is to instruct the route lookup function to not take
    refcnt on the dst entry. The user which does route lookup with this flag
    must properly use rcu protection.
    ip6_pol_route() is the major route lookup function for both tx and rx
    path.
    In this function:
    Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and
    directly return the route entry. The caller should be holding rcu lock
    when using this flag, and decide whether to take refcnt or not.

    One note on the dst cache in the uncached_list:
    As uncached_list does not consume refcnt, one refcnt is always returned
    back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set.
    Uncached dst is only possible in the output path. So in such call path,
    caller MUST check if the dst is in the uncached_list before assuming
    that there is no refcnt taken on the returned dst.

    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Acked-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Wei Wang
     

05 Jun, 2019

1 commit

  • Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
    fib6_info side of the nexthop fib_info relationship. Since a fib6_info
    referencing a nexthop object can not have 'sibling' entries (the old way
    of doing multipath routes), the nh_list is a union with fib6_siblings.

    Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
    using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
    and delete fib entries using the nexthop.

    Add a few nexthop helpers for use when a nexthop is added to fib6_info:
    - nexthop_fib6_nh - return first fib6_nh in a nexthop object
    - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
    if the fib6_info references a nexthop object
    - nexthop_path_fib6_result - similar to ipv4, select a path within a
    multipath nexthop object. If the nexthop is a blackhole, set
    fib6_result type to RTN_BLACKHOLE, and set the REJECT flag

    Update the fib6_info references to check for nh and take a different path
    as needed:
    - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
    be coalesced with other fib entries into a multipath route
    - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
    a nexthop
    - addrconf (host routes), RA's and info entries (anything configured via
    ndisc) does not use nexthop objects
    - fib6_info_destroy_rcu - put reference to nexthop object
    - fib6_purge_rt - drop fib6_info from f6i_list
    - fib6_select_path - update to use the new nexthop_path_fib6_result when
    fib entry uses a nexthop object
    - rt6_device_match - update to catch use of nexthop object as a blackhole
    and set fib6_type and flags.
    - ip6_route_info_create - don't add space for fib6_nh if fib entry is
    going to reference a nexthop object, take a reference to nexthop object,
    disallow use of source routing
    - rt6_nlmsg_size - add space for RTA_NH_ID
    - add rt6_fill_node_nexthop to add nexthop data on a dump

    As with ipv4, most of the changes push existing code into the else branch
    of whether the fib entry uses a nexthop object.

    Update the nexthop code to walk f6i_list on a nexthop deleted to remove
    fib entries referencing it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

25 May, 2019

1 commit

  • Move fib6_nh to the end of fib6_info and make it an array of
    size 0. Pass a flag to fib6_info_alloc indicating if the
    allocation needs to add space for a fib6_nh.

    The current code path always has a fib6_nh allocated with a
    fib6_info; with nexthop objects they will be separate.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

24 Apr, 2019

1 commit

  • nhc_flags holds the RTNH_F flags for a given nexthop (fib{6}_nh).
    All of the RTNH_F_ flags fit in an unsigned char, and since the API to
    userspace (rtnh_flags and lower byte of rtm_flags) is 1 byte it can not
    grow. Make nhc_flags in fib_nh_common an unsigned char and shrink the
    size of the struct by 8, from 56 to 48 bytes.

    Update the flags arguments for up netdevice events and fib_nexthop_info
    which determines the RTNH_F flags to return on a dump/event. The RTNH_F
    flags are passed in the lower byte of rtm_flags which is an unsigned int
    so use a temp variable for the flags to fib_nexthop_info and combine
    with rtm_flags in the caller.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

22 Apr, 2019

2 commits

  • The RTF_ADDRCONF flag filters out routes added by RA's in determining
    which routes can be appended to an existing one to create a multipath
    route. Restore the flag check and add a comment to document the RA piece.

    Fixes: 4e54507ab1a9 ("ipv6: Simplify rt6_qualify_for_ecmp")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • After commit c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use
    ip6_route_info_create"), the gateway is no longer filled in for fib6_nh
    structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can
    be dropped from the 'rt6_qualify_for_ecmp'.

    Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be
    removed from the check as well.

    This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking
    if the nexthop has a gateway which is the real indication of whether
    entries can be coalesced into a multipath route.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

18 Apr, 2019

1 commit


09 Apr, 2019

1 commit

  • Allow the gateway in a fib_nh_common to be from a different address
    family than the outer fib{6}_nh. To that end, replace nhc_has_gw with
    nhc_gw_family and update users of nhc_has_gw to check nhc_gw_family.
    Now nhc_family is used to know if the nh_common is part of a fib_nh
    or fib6_nh (used for container_of to get to route family specific data),
    and nhc_gw_family represents the address family for the gateway.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

30 Mar, 2019

2 commits

  • Rename fib6_nh entries that will be moved to a fib_nh_common struct.
    Specifically, the device, gateway, flags, and lwtstate are common
    with all nexthop definitions. In some places new temporary variables
    are declared or local variables renamed to maintain line lengths.

    Rename only; no functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new
    fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to
    the new flag. For IPv6 address the flag is cheaper than checking that
    nh_gw is non-0 like IPv4 does.

    While this increases fib6_nh by 8-bytes, the effective allocation size of
    a fib6_info is unchanged. The 8 bytes is recovered later with a
    fib_nh_common change.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

16 Oct, 2018

1 commit

  • Add struct fib_dump_filter for options on limiting which routes are
    returned in a dump request. The current list is table id, protocol,
    route type, rtm_flags and nexthop device index. struct net is needed
    to lookup the net_device from the index.

    Declare the filter for each route dump handler and plumb the new
    arguments from dump handlers to ip_valid_fib_dump_req.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 Oct, 2018

1 commit


04 Jul, 2018

1 commit

  • NetworkManager likes to manage linklocal prefix routes and does so with
    the NLM_F_APPEND flag, breaking attempts to simplify the IPv6 route
    code and by extension enable multipath routes with device only nexthops.

    Revert f34436a43092 and these followup patches:
    6eba08c3626b ("ipv6: Only emit append events for appended routes").
    ce45bded6435 ("mlxsw: spectrum_router: Align with new route replace logic")
    53b562df8c20 ("mlxsw: spectrum_router: Allow appending to dev-only routes")

    Update the fib_tests cases to reflect the old behavior.

    Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
    Signed-off-by: David Ahern

    David Ahern
     

25 May, 2018

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2018-05-24

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).

    2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.

    3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.

    4) Jiong Wang adds support for indirect and arithmetic shifts to NFP

    5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.

    6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
    to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.

    7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.

    8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.

    9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

23 May, 2018

1 commit

  • Bring consistency to ipv6 route replace and append semantics.

    Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
    1. can not replace a route with a reject route. Existing code appends
    a new route instead of replacing the existing one.

    2. can not have a multipath route where a leg uses a dev only nexthop

    Existing use cases affected by this change:
    1. adding a route with existing prefix and metric using NLM_F_CREATE
    without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
    'prepend'). Existing code auto-determines that the new nexthop can
    be appended to an existing route to create a multipath route. This
    change breaks that by requiring the APPEND flag for the new route
    to be added to an existing one. Instead the prepend just adds another
    route entry.

    2. route replace. Existing code replaces first matching multipath route
    if new route is multipath capable and fallback to first matching
    non-ECMP route (reject or dev only route) in case one isn't available.
    New behavior replaces first matching route. (Thanks to Ido for spotting
    this one)

    Note: Newer iproute2 is needed to display multipath routes with a dev-only
    nexthop. This is due to a bug in iproute2 and parsing nexthops.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

22 May, 2018

1 commit


07 May, 2018

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for your net-next
    tree, more relevant updates in this batch are:

    1) Add Maglev support to IPVS. Moreover, store lastest server weight in
    IPVS since this is needed by maglev, patches from from Inju Song.

    2) Preparation works to add iptables flowtable support, patches
    from Felix Fietkau.

    3) Hand over flows back to conntrack slow path in case of TCP RST/FIN
    packet is seen via new teardown state, also from Felix.

    4) Add support for extended netlink error reporting for nf_tables.

    5) Support for larger timeouts that 23 days in nf_tables, patch from
    Florian Westphal.

    6) Always set an upper limit to dynamic sets, also from Florian.

    7) Allow number generator to make map lookups, from Laura Garcia.

    8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat.

    9) Extend ip6tables SRH match to support previous, next and last SID,
    from Ahmed Abdelsalam.

    10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez.

    11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot.

    12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables,
    from Taehee Yoo.

    13) Unify meta bridge with core nft_meta, then make nft_meta built-in.
    Make rt and exthdr built-in too, again from Florian.

    14) Missing initialization of tbl->entries in IPVS, from Cong Wang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Apr, 2018

1 commit


20 Apr, 2018

4 commits

  • After 4832c30d5458 ("net: ipv6: put host and anycast routes on device
    with address") the comparison of idev does not add value since it
    correlates to the nexthop device which is already compared. Remove
    the idev comparison.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
    with address") host routes and anycast routes were installed with the
    device set to loopback (or VRF device once that feature was added). In the
    older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
    was used to denote the actual interface.

    Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
    device with the switch to lo or vrf device done on dst clones. As a
    consequence of this change ip6_route_get_saddr can just pass the nexthop
    device to ipv6_dev_get_saddr.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • addrconf_dst_alloc now returns a fib6_info. Update the name
    and its users to reflect the change.

    Rename only; no functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
    rt6i_pcpu and rt6i_exception_bucket are left as is given that they
    point to rt6_info entries.

    Rename only; not functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

18 Apr, 2018

7 commits

  • Convert all code paths referencing a FIB entry from
    rt6_info to fib6_info.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Last step before flipping the data type for FIB entries:
    - use fib6_info_alloc to create FIB entries in ip6_route_info_create
    and addrconf_dst_alloc
    - use fib6_info_release in place of dst_release, ip6_rt_put and
    rt6_release
    - remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
    - when purging routes, drop per-cpu routes
    - replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
    - use rt->from since it points to the FIB entry
    - drop references to exception bucket, fib6_metrics and per-cpu from
    dst entries (those are relevant for fib entries only)

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • IPv6 FIB will only contain FIB entries with exception routes added to
    the FIB entry. Once this transformation is complete, FIB lookups will
    return a fib6_info with the lookup functions still returning a dst
    based rt6_info. The current code uses rt6_info for both paths and
    overloads the rt6_info variable usually called 'rt'.

    This patch introduces a new 'f6i' variable name for the result of the FIB
    lookup and keeps 'rt' as the dst based return variable. 'f6i' becomes a
    fib6_info in a later patch which is why it is introduced as f6i now;
    avoids the additional churn in the later patch.

    In addition, remove RTF_CACHE and dst checks from fib6 add and delete
    since they can not happen now and will never happen after the data
    type flip.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Most FIB entries can be added using memory allocated with GFP_KERNEL.
    Add gfp_flags to ip6_route_add and addrconf_dst_alloc. Code paths that
    can be reached from the packet path (e.g., ndisc and autoconfig) or
    atomic notifiers use GFP_ATOMIC; paths from user context (adding
    addresses and routes) use GFP_KERNEL.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • The router discovery code has a FIB entry and wants to validate the
    gateway has a neighbor entry. Refactor the existing dst_neigh_lookup
    for IPv6 and create a new function that takes the gateway and device
    and returns a neighbor entry. Use the new function in
    ndisc_router_discovery to validate the gateway.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Introduce fib6_nh structure and move nexthop related data from
    rt6_info and rt6_info.dst to fib6_nh. References to dev, gateway or
    lwtstate from a FIB lookup perspective are converted to use fib6_nh;
    datapath references to dst version are left as is.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Pass network namespace reference into route add, delete and get
    functions.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

04 Apr, 2018

1 commit

  • Move commonly used pattern of ip6_dst_store() usage to a separate
    function - ip6_sk_dst_store_flow(), which will check the addresses
    for equality using the flow information, before saving them.

    There is no functional changes in this patch. In addition, it will
    be used in the next patch, in ip6_sk_dst_lookup_flow().

    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

23 Mar, 2018

1 commit

  • Fun set of conflict resolutions here...

    For the mac80211 stuff, these were fortunately just parallel
    adds. Trivially resolved.

    In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
    function phy_disable_interrupts() earlier in the file, whilst in
    'net-next' the phy_error() call from this function was removed.

    In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
    'rt_table_id' member of rtable collided with a bug fix in 'net' that
    added a new struct member "rt_mtu_locked" which needs to be copied
    over here.

    The mlxsw driver conflict consisted of net-next separating
    the span code and definitions into separate files, whilst
    a 'net' bug fix made some changes to that moved code.

    The mlx5 infiniband conflict resolution was quite non-trivial,
    the RDMA tree's merge commit was used as a guide here, and
    here are their notes:

    ====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch. This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
    drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
    (IB/mlx5: Fix cleanup order on unload) added to for-rc and
    commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
    add as part of the devel cycle both needed to modify the
    init/de-init functions used by mlx5. To support the new
    representors, the new functions added by the cleanup patch
    needed to be made non-static, and the init/de-init list
    added by the representors patch needed to be modified to
    match the init/de-init list changes made by the cleanup
    patch.
    Updates:
    drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
    prototypes added by representors patch to reflect new function
    names as changed by cleanup patch
    drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
    stage list to match new order from cleanup patch
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Mar, 2018

2 commits

  • Some operators prefer IPv6 path selection to use a standard 5-tuple
    hash rather than just an L3 hash with the flow the label. To that end
    add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
    ("net: ipv4: add support for ECMP hash policy choice"). The default
    is still L3 which covers source and destination addresses along with
    flow label and IPv6 protocol.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Tested-by: Ido Schimmel
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     
  • IPv6 does path selection for multipath routes deep in the lookup
    functions. The next patch adds L4 hash option and needs the skb
    for the forward path. To get the skb to the relevant FIB lookup
    functions it needs to go through the fib rules layer, so add a
    lookup_data argument to the fib_lookup_arg struct.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

01 Mar, 2018

1 commit

  • Dissect flow in fwd path if fib rules require it. Controlled by
    a flag to avoid penatly for the common case. Flag is set when fib
    rules with sport, dport and proto match that require flow dissect
    are installed. Also passes the dissected hash keys to the multipath
    hash function when applicable to avoid dissecting the flow again.
    icmp packets will continue to use inner header for hash
    calculations.

    Signed-off-by: Roopa Prabhu
    Acked-by: Paolo Abeni
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Roopa Prabhu