13 Mar, 2020

1 commit

  • Convert the various uses of fallthrough comments to fallthrough;

    Done via script
    Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

    And by hand:

    net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
    that causes gcc to emit a warning if converted in-place.

    So move the new fallthrough; inside the containing #ifdef/#endif too.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

17 Feb, 2020

1 commit

  • After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no
    longer possible to replace an ECMP-able route by a non ECMP-able route.
    For example,
    ip route add 2001:db8::1/128 via fe80::1 dev dummy0
    ip route replace 2001:db8::1/128 dev dummy0
    does not work as expected.

    Tweak the replacement logic so that point 3 in the log of the above commit
    becomes:
    3. If the new route is not ECMP-able, and no matching non-ECMP-able route
    exists, replace matching ECMP-able route (if any) or add the new route.

    We can now summarize the entire replace semantics to:
    When doing a replace, prefer replacing a matching route of the same
    "ECMP-able-ness" as the replace argument. If there is no such candidate,
    fallback to the first route found.

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Benjamin Poirier
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Benjamin Poirier
     

26 Jan, 2020

1 commit


24 Jan, 2020

1 commit


25 Dec, 2019

5 commits

  • Now that mlxsw is converted to use the new FIB notifications it is
    possible to delete the old ones and use the new replace / append /
    delete notifications.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • For the purpose of route offload, when a single route is deleted, it is
    only of interest if it is the first route in the node or if it is
    sibling to such a route.

    In the first case, distinguish between several possibilities:

    1. Route is the last route in the node. Emit a delete notification

    2. Route is followed by a non-multipath route. Emit a replace
    notification for the non-multipath route.

    3. Route is followed by a multipath route. Emit a replace notification
    for the multipath route.

    In the second case, only emit a delete notification to ensure the route
    is no longer used as a valid nexthop.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a new listener is registered to the FIB notification chain it
    receives a dump of all the available routes in the system. Instead, make
    sure to only replay the IPv6 routes that are actually used in the data
    path and are of any interest to the new listener.

    This is done by iterating over all the routing tables in the given
    namespace, but from each traversed node only the first route ('leaf') is
    notified. Multipath routes are notified in a single notification instead
    of one for each nexthop.

    Add fib6_rt_dump_tmp() to do that. Later on in the patch set it will be
    renamed to fib6_rt_dump() instead of the existing one.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Similar to the corresponding IPv4 patch, only notify the new route if it
    is replacing the currently offloaded one. Meaning, the one pointed to by
    'fn->leaf'.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • fib6_add_rt2node() takes care of adding a single route ('struct
    fib6_info') to a FIB node. The route in question should only be notified
    in case it is added as the first route in the node (lowest metric) or if
    it is added as a sibling route to the first route in the node.

    The first criterion can be tested by checking if the route is pointed to
    by 'fn->leaf'. The second criterion can be tested by checking the new
    'notify_sibling_rt' variable that is set when the route is added as a
    sibling to the first route in the node.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

22 Nov, 2019

1 commit

  • Use a per namespace counter, increment it on successful creation
    of any route using the source address, decrement it on deletion
    of such routes.

    This allows us to check easily if the routing decision in the
    current namespace depends on the packet source. Will be used
    by the next patch.

    Suggested-by: David Ahern
    Signed-off-by: Paolo Abeni
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Paolo Abeni
     

05 Oct, 2019

3 commits


21 Sep, 2019

1 commit

  • Yi Ren reported an issue discovered by syzkaller, and bisected
    to the cited commit.

    Many thanks to Yi, this trivial patch does not reflect the patient
    work that has been done.

    Fixes: d64a1f574a29 ("ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic")
    Signed-off-by: Eric Dumazet
    Acked-by: Wei Wang
    Bisected-and-reported-by: Yi Ren
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     

19 Jul, 2019

1 commit

  • When a route needs to be appended to an existing multipath route,
    fib6_add_rt2node() first appends it to the siblings list and increments
    the number of sibling routes on each sibling.

    Later, the function notifies the route via call_fib6_entry_notifiers().
    In case the notification is vetoed, the route is not unlinked from the
    siblings list, which can result in a use-after-free.

    Fix this by unlinking the route from the siblings list before returning
    an error.

    Audited the rest of the call sites from which the FIB notification chain
    is called and could not find more problems.

    Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
    Signed-off-by: Ido Schimmel
    Reported-by: Alexander Petrovskiy
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

25 Jun, 2019

5 commits

  • When we perform an inexact match on FIB nodes via fib6_locate_1(), longer
    prefixes will be preferred to shorter ones. However, it might happen that
    a node, with higher fn_bit value than some other, has no valid routing
    information.

    In this case, we'll pick that node, but it will be discarded by the check
    on RTN_RTINFO in fib6_locate(), and we might miss nodes with valid routing
    information but with lower fn_bit value.

    This is apparent when a routing exception is created for a default route:
    # ip -6 route list
    fc00:1::/64 dev veth_A-R1 proto kernel metric 256 pref medium
    fc00:2::/64 dev veth_A-R2 proto kernel metric 256 pref medium
    fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 pref medium
    fe80::/64 dev veth_A-R1 proto kernel metric 256 pref medium
    fe80::/64 dev veth_A-R2 proto kernel metric 256 pref medium
    default via fc00:1::2 dev veth_A-R1 metric 1024 pref medium
    # ip -6 route list cache
    fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 expires 593sec mtu 1500 pref medium
    fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 593sec mtu 1500 pref medium
    # ip -6 route flush cache # node for default route is discarded
    Failed to send flush request: No such process
    # ip -6 route list cache
    fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 586sec mtu 1500 pref medium

    Check right away if the node has a RTN_RTINFO flag, before replacing the
    'prev' pointer, that indicates the longest matching prefix found so far.

    Fixes: 38fbeeeeccdb ("ipv6: prepare fib6_locate() for exception table")
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst
    cache"), route exceptions reside in a separate hash table, and won't be
    found by walking the FIB, so they won't be dumped to userspace on a
    RTM_GETROUTE message.

    This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to
    have no function anymore:

    # ip -6 route get fc00:3::1
    fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium
    # ip -6 route get fc00:4::1
    fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium
    # ip -6 route list cache
    # ip -6 route flush cache
    # ip -6 route get fc00:3::1
    fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium
    # ip -6 route get fc00:4::1
    fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium

    because iproute2 lists cached routes using RTM_GETROUTE, and flushes them
    by listing all the routes, and deleting them with RTM_DELROUTE one by one.

    If cached routes are requested using the RTM_F_CLONED flag together with
    strict checking, or if no strict checking is requested (and hence we can't
    consistently apply filters), look up exceptions in the hash table
    associated with the current fib6_info in rt6_dump_route(), and, if present
    and not expired, add them to the dump.

    We might be unable to dump all the entries for a given node in a single
    message, so keep track of how many entries were handled for the current
    node in fib6_walker, and skip that amount in case we start from the same
    partially dumped node.

    When a partial dump restarts, as the starting node might change when
    'sernum' changes, we have no guarantee that we need to skip the same
    amount of in-node entries. Therefore, we need two counters, and we need to
    zero the in-node counter if the node from which the dump is resumed
    differs.

    Note that, with the current version of iproute2, this only fixes the
    'ip -6 route list cache': on a flush command, iproute2 doesn't pass
    RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is
    still unable to fetch the routes to be flushed. This will be addressed in
    a patch for iproute2.

    To flush cached routes, a procfs entry could be introduced instead: that's
    how it works for IPv4. We already have a rt6_flush_exception() function
    ready to be wired to it. However, this would not solve the issue for
    listing.

    Versions of iproute2 and kernel tested:

    iproute2
    kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched
    3.18 list + + + + + +
    flush + + + + + +
    4.4 list + + + + + +
    flush + + + + + +
    4.9 list + + + + + +
    flush + + + + + +
    4.14 list + + + + + +
    flush + + + + + +
    4.15 list
    flush
    4.19 list
    flush
    5.0 list
    flush
    5.1 list
    flush
    with list + + + + + +
    fix flush + + + +

    v7:
    - Explain usage of "skip" counters in commit message (suggested by
    David Ahern)

    v6:
    - Rebase onto net-next, use recently introduced nexthop walker
    - Make rt6_nh_dump_exceptions() a separate function (suggested by David
    Ahern)

    v5:
    - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH,
    update test results (flushing works with iproute2 < 5.0.0 now)

    v4:
    - Split NLM_F_MATCH and strict check handling in separate patches
    - Filter routes using RTM_F_CLONED: if it's not set, only return
    non-cached routes, and if it's set, only return cached routes:
    change requested by David Ahern and Martin Lau. This implies that
    iproute2 needs a separate patch to be able to flush IPv6 cached
    routes. This is not ideal because we can't fix the breakage caused
    by 2b760fcf5cfb entirely in kernel. However, two years have passed
    since then, and this makes it more tolerable

    v3:
    - More descriptive comment about expired exceptions in rt6_dump_route()
    - Swap return values of rt6_dump_route() (suggested by Martin Lau)
    - Don't zero skip_in_node in case we don't dump anything in a given pass
    (also suggested by Martin Lau)
    - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic,
    it's just a flag to indicate the route was cloned, not to filter on
    routes

    v2: Add tracking of number of entries to be skipped in current node after
    a partial dump. As we restart from the same node, if not all the
    exceptions for a given node fit in a single message, the dump will
    not terminate, as suggested by Martin Lau. This is a concrete
    possibility, setting up a big number of exceptions for the same route
    actually causes the issue, suggested by David Ahern.

    Reported-by: Jianlin Shi
    Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • In the next patch, we are going to add optional dump of exceptions to
    rt6_dump_route().

    Change the return code of rt6_dump_route() to accomodate partial node
    dumps: we might dump multiple routes per node, and might be able to dump
    only a given number of them, so fib6_dump_node() will need to know how
    many routes have been dumped on partial dump, to restart the dump from the
    point where it was interrupted.

    Note that fib6_dump_node() is the only caller and already handles all
    non-negative return codes as success: those become -1 to signal that we're
    done with the node. If we fail, return 0, as we were unable to dump the
    single route in the node, but we're not done with it.

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • This reverts commit 08e814c9e8eb5a982cbd1e8f6bd255d97c51026f: as we
    are preparing to fix listing and dumping of IPv6 cached routes, we
    need to allow RTM_F_CLONED as a flag to match routes against while
    dumping them.

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • The following patches add back the ability to dump IPv4 and IPv6 exception
    routes, and we need to allow selection of regular routes or exceptions.

    Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions:
    iproute2 passes it in dump requests (except for IPv6 cache flush requests,
    this will be fixed in iproute2) and this used to work as long as
    exceptions were stored directly in the FIB, for both IPv4 and IPv6.

    Caveat: if strict checking is not requested (that is, if the dump request
    doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol,
    tables or route types.

    In this case, filtering on RTM_F_CLONED would be inconsistent: we would
    fix 'ip route list cache' by returning exception routes and at the same
    time introduce another bug in case another selector is present, e.g. on
    'ip route list cache table main' we would return all exception routes,
    without filtering on tables.

    Keep this consistent by applying no filters at all, and dumping both
    routes and exceptions, if strict checking is not requested. iproute2
    currently filters results anyway, and no unwanted results will be
    presented to the user. The kernel will just dump more data than needed.

    v7: No changes

    v6: Rebase onto net-next, no changes

    v5: New patch: add dump_routes and dump_exceptions flags in filter and
    simply clear the unwanted one if strict checking is enabled, don't
    ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set.
    Skip filtering altogether if no strict checking is requested:
    selecting routes or exceptions only would be inconsistent with the
    fact we can't filter on tables.

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

24 Jun, 2019

1 commit

  • This patch specifically converts the rule lookup logic to honor this
    flag and not release refcnt when traversing each rule and calling
    lookup() on each routing table.
    Similar to previous patch, we also need some special handling of dst
    entries in uncached list because there is always 1 refcnt taken for them
    even if RT6_LOOKUP_F_DST_NOREF flag is set.

    Signed-off-by: Wei Wang
    Signed-off-by: David S. Miller

    Wei Wang
     

19 Jun, 2019

2 commits

  • Both listeners - mlxsw and netdevsim - of IPv6 FIB notifications are now
    ready to handle IPv6 multipath notifications.

    Therefore, stop ignoring such notifications in both drivers and stop
    sending notification for each added / deleted nexthop.

    v2:
    * Remove 'multipath_rt' from 'struct fib6_entry_notifier_info'

    Signed-off-by: Ido Schimmel
    Acked-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Extend the IPv6 FIB notifier info with number of sibling routes being
    notified.

    This will later allow listeners to process one notification for a
    multipath routes instead of N, where N is the number of nexthops.

    Signed-off-by: Ido Schimmel
    Acked-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

11 Jun, 2019

1 commit


08 Jun, 2019

1 commit


05 Jun, 2019

1 commit

  • Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
    fib6_info side of the nexthop fib_info relationship. Since a fib6_info
    referencing a nexthop object can not have 'sibling' entries (the old way
    of doing multipath routes), the nh_list is a union with fib6_siblings.

    Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
    using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
    and delete fib entries using the nexthop.

    Add a few nexthop helpers for use when a nexthop is added to fib6_info:
    - nexthop_fib6_nh - return first fib6_nh in a nexthop object
    - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
    if the fib6_info references a nexthop object
    - nexthop_path_fib6_result - similar to ipv4, select a path within a
    multipath nexthop object. If the nexthop is a blackhole, set
    fib6_result type to RTN_BLACKHOLE, and set the REJECT flag

    Update the fib6_info references to check for nh and take a different path
    as needed:
    - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
    be coalesced with other fib entries into a multipath route
    - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
    a nexthop
    - addrconf (host routes), RA's and info entries (anything configured via
    ndisc) does not use nexthop objects
    - fib6_info_destroy_rcu - put reference to nexthop object
    - fib6_purge_rt - drop fib6_info from f6i_list
    - fib6_select_path - update to use the new nexthop_path_fib6_result when
    fib entry uses a nexthop object
    - rt6_device_match - update to catch use of nexthop object as a blackhole
    and set fib6_type and flags.
    - ip6_route_info_create - don't add space for fib6_nh if fib entry is
    going to reference a nexthop object, take a reference to nexthop object,
    disallow use of source routing
    - rt6_nlmsg_size - add space for RTA_NH_ID
    - add rt6_fill_node_nexthop to add nexthop data on a dump

    As with ipv4, most of the changes push existing code into the else branch
    of whether the fib entry uses a nexthop object.

    Update the nexthop code to walk f6i_list on a nexthop deleted to remove
    fib entries referencing it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 May, 2019

4 commits

  • Move fib6_nh to the end of fib6_info and make it an array of
    size 0. Pass a flag to fib6_info_alloc indicating if the
    allocation needs to add space for a fib6_nh.

    The current code path always has a fib6_nh allocated with a
    fib6_info; with nexthop objects they will be separate.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Similar to the pcpu routes exceptions are really per nexthop, so move
    rt6i_exception_bucket from fib6_info to fib6_nh.

    To avoid additional increases to the size of fib6_nh for a 1-bit flag,
    use the lowest bit in the allocated memory pointer for the flushed flag.
    Add helpers for retrieving the bucket pointer to mask off the flag.

    The cleanup of the exception bucket is moved to fib6_nh_release.

    fib6_nh_flush_exceptions can now be called from 2 contexts:
    1. deleting a fib entry
    2. deleting a fib6_nh

    For 1., fib6_nh_flush_exceptions is called for a specific fib6_info that
    is getting deleted. All exceptions in the cache using the entry are
    deleted. For 2, the fib6_nh itself is getting destroyed so
    fib6_nh_flush_exceptions is called for a NULL fib6_info which means
    flush all entries.

    The pmtu.sh selftest exercises the affected code paths - from creating
    exceptions to cleaning them up on device delete. All tests pass without
    any rcu locking or memleak warnings.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Move the existing pcpu walk in fib6_drop_pcpu_from to a new
    helper, __fib6_drop_pcpu_from, that can be invoked per fib6_nh with a
    reference to the from entries that need to be evicted. If the passed
    in 'from' is non-NULL then only entries associated with that fib6_info
    are removed (e.g., case where fib entry is deleted); if the 'from' is
    NULL are entries are flushed (e.g., fib6_nh is deleted).

    For fib6_info entries with builtin fib6_nh (ie., current code) there
    is no change in behavior.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • rt6_info are specific instances of a fib entry and are tied to a
    device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib
    entries have separate fib6_info for each nexthop in a multipath route,
    so the location of the pcpu cache in the fib6_info struct worked.
    However, with nexthop objects a fib6_info can point to a set of nexthops
    (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu
    cache needs to be moved to the fib6_nh struct so the cached entries
    are local to the nexthop specification used to create the rt6_info.

    Initialization and free of the pcpu entries moved to fib6_nh_init and
    fib6_nh_release.

    Change in location only, from fib6_info down to fib6_nh; no other
    functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

23 May, 2019

2 commits

  • Add fib6_rt_update to send RTM_NEWROUTE with NLM_F_REPLACE set. This
    helper will be used by the nexthop code to notify userspace of routes
    that are impacted when a nexthop config is updated via replace.

    This notification is needed for legacy apps that do not understand
    the new nexthop object. Apps that are nexthop aware can use the
    RTA_NH_ID attribute in the route notification to just ignore it.

    In the future this should be wrapped in a sysctl to allow OS'es that
    are fully updated to avoid the notificaton storm.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add hook to ipv6 stub to bump the sernum up to the root node for a
    route. This is needed by the nexthop code when a nexthop config changes.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

17 May, 2019

1 commit

  • At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
    for finding all percpu routes and set their ->from pointer
    to NULL, so that fib6_ref can reach its expected value (1).

    The problem right now is that other cpus can still catch the
    route being deleted, since there is no rcu grace period
    between the route deletion and call to fib6_drop_pcpu_from()

    This can leak the fib6 and associated resources, since no
    notifier will take care of removing the last reference(s).

    I decided to add another boolean (fib6_destroying) instead
    of reusing/renaming exception_bucket_flushed to ease stable backports,
    and properly document the memory barriers used to implement this fix.

    This patch has been co-developped with Wei Wang.

    Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Wei Wang
    Cc: David Ahern
    Cc: Martin Lau
    Acked-by: Wei Wang
    Acked-by: Martin KaFai Lau
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 May, 2019

1 commit


01 May, 2019

1 commit

  • We had many syzbot reports that seem to be caused by use-after-free
    of struct fib6_info.

    ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
    are writers vs rt->from, and use non consistent synchronization among
    themselves.

    Switching to xchg() will solve the issues with no possible
    lockdep issues.

    BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
    BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
    BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
    BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
    BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
    Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649

    CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
    check_memory_region_inline mm/kasan/generic.c:185 [inline]
    check_memory_region+0x123/0x190 mm/kasan/generic.c:191
    kasan_check_write+0x14/0x20 mm/kasan/common.c:108
    atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
    fib6_info_release include/net/ip6_fib.h:294 [inline]
    fib6_info_release include/net/ip6_fib.h:292 [inline]
    fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
    fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
    fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
    fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
    fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
    fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
    fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
    fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
    __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
    fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
    rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
    rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
    addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
    addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
    notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
    call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
    call_netdevice_notifiers net/core/dev.c:1779 [inline]
    dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
    rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
    rollback_registered+0x109/0x1d0 net/core/dev.c:8242
    unregister_netdevice_queue net/core/dev.c:9289 [inline]
    unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
    unregister_netdevice include/linux/netdevice.h:2658 [inline]
    __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
    tun_detach drivers/net/tun.c:744 [inline]
    tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
    __fput+0x2e5/0x8d0 fs/file_table.c:278
    ____fput+0x16/0x20 fs/file_table.c:309
    task_work_run+0x14a/0x1c0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x90a/0x2fa0 kernel/exit.c:876
    do_group_exit+0x135/0x370 kernel/exit.c:980
    __do_sys_exit_group kernel/exit.c:991 [inline]
    __se_sys_exit_group kernel/exit.c:989 [inline]
    __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x458da9
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
    RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
    RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
    RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
    R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800

    Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: David Ahern
    Reviewed-by: David Ahern
    Acked-by: Martin KaFai Lau
    Acked-by: Wei Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Apr, 2019

3 commits

  • We suspect some issues involving fib6_ref 0 -> 1 transitions might
    cause strange syzbot reports.

    Lets convert fib6_ref to refcount_t to catch them earlier.

    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Instead of using atomic_inc(), prefer fib6_info_hold()
    so that upcoming refcount_t conversion is simpler.

    Only fib6_info_alloc() is using atomic_set() since we
    just allocated a new object.

    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We do not need to clear f6i->rt6i_exception_bucket right before
    freeing f6i.

    Note that f6i->rt6i_exception_bucket is properly protected by
    f6i->exception_bucket_flushed being set to one in rt6_flush_exceptions()
    under the protection of rt6_exception_lock.

    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Apr, 2019

1 commit

  • Change fib6_lookup and fib6_table_lookup to take a fib6_result and set
    f6i and nh rather than returning a fib6_info. For now both always
    return 0.

    A later patch set can make these more like the IPv4 counterparts and
    return EINVAL, EACCESS, etc based on fib6_type.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern