13 Jan, 2021

1 commit

  • [ Upstream commit 21fdca22eb7df2a1e194b8adb812ce370748b733 ]

    RT_TOS() only clears one of the ECN bits. Therefore, when
    fib_compute_spec_dst() resorts to a fib lookup, it can return
    different results depending on the value of the second ECN bit.

    For example, ECT(0) and ECT(1) packets could be treated differently.

    $ ip netns add ns0
    $ ip netns add ns1
    $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
    $ ip -netns ns0 link set dev lo up
    $ ip -netns ns1 link set dev lo up
    $ ip -netns ns0 link set dev veth01 up
    $ ip -netns ns1 link set dev veth10 up

    $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
    $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

    $ ip -netns ns1 address add 192.0.2.21/32 dev lo
    $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
    $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

    With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
    (ping uses -Q to set all TOS and ECN bits):

    $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
    [...]
    64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

    But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
    because the "tos 4" route isn't matched:

    $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
    [...]
    64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

    After this patch the ECN bits don't affect the result anymore:

    $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
    [...]
    64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

    Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     

05 Dec, 2020

1 commit

  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes")
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Reviewed-by: David Ahern
    Link: https://lore.kernel.org/r/1607071695-33740-1-git-send-email-zhangchangzhong@huawei.com
    Signed-off-by: Jakub Kicinski

    Zhang Changzhong
     

18 Nov, 2020

1 commit

  • Checking for ifdef CONFIG_x fails if CONFIG_x=m.

    Use IS_ENABLED instead, which is true for both built-ins and modules.

    Otherwise, a
    > ip -4 route add 1.2.3.4/32 via inet6 fe80::2 dev eth1
    fails with the message "Error: IPv6 support not enabled in kernel." if
    CONFIG_IPV6 is `m`.

    In the spirit of b8127113d01e53adba15b41aefd37b90ed83d631.

    Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes")
    Cc: Kim Phillips
    Signed-off-by: Florian Klink
    Reviewed-by: David Ahern
    Link: https://lore.kernel.org/r/20201115224509.2020651-1-flokli@flokli.de
    Signed-off-by: Jakub Kicinski

    Florian Klink
     

15 Sep, 2020

1 commit

  • flowi4_multipath_hash was added by the commit referenced below for
    tunnels. Unfortunately, the patch did not initialize the new field
    for several fast path lookups that do not initialize the entire flow
    struct to 0. Fix those locations. Currently, flowi4_multipath_hash
    is random garbage and affects the hash value computed by
    fib_multipath_hash for multipath selection.

    Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash")
    Signed-off-by: David Ahern
    Cc: wenxu
    Signed-off-by: David S. Miller

    David Ahern
     

27 May, 2020

1 commit

  • Similar to the last path, need to fix fib_info_nh_uses_dev for
    external nexthops to avoid referencing multiple nh_grp structs.
    Move the device check in fib_info_nh_uses_dev to a helper and
    create a nexthop version that is called if the fib_info uses an
    external nexthop.

    Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
    Signed-off-by: David Ahern
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

22 May, 2020

1 commit

  • In case we can't find a ->dumpit callback for the requested
    (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
    in the same situation as if userspace had requested a PF_UNSPEC
    dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
    the registered RTM_GETROUTE handlers.

    The requested table id may or may not exist for all of those
    families. commit ae677bbb4441 ("net: Don't return invalid table id
    error when dumping all families") fixed the problem when userspace
    explicitly requests a PF_UNSPEC dump, but missed the fallback case.

    For example, when we pass ipv6.disable=1 to a kernel with
    CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
    the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
    rtnl_dump_all, and listing IPv6 routes will unexpectedly print:

    # ip -6 r
    Error: ipv4: MR table does not exist.
    Dump terminated

    commit ae677bbb4441 introduced the dump_all_families variable, which
    gets set when userspace requests a PF_UNSPEC dump. However, we can't
    simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
    fallback case to get dump_all_families == true, because some messages
    types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
    PF_UNSPEC handler and use the family to filter in the kernel what is
    dumped to userspace. We would then export more entries, that userspace
    would have to filter. iproute does that, but other programs may not.

    Instead, this patch removes dump_all_families and updates the
    RTM_GETROUTE handlers to check if the family that is being dumped is
    their own. When it's not, which covers both the intentional PF_UNSPEC
    dumps (as dump_all_families did) and the fallback case, ignore the
    missing table id error.

    Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

24 Mar, 2020

1 commit

  • There is a place,

    inet_dump_fib()
    fib_table_dump
    fn_trie_dump_leaf()
    hlist_for_each_entry_rcu()

    without rcu_read_lock() will trigger a warning,

    WARNING: suspicious RCU usage
    -----------------------------
    net/ipv4/fib_trie.c:2216 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ip/1923:
    #0: ffffffff8ce76e40 (rtnl_mutex){+.+.}, at: netlink_dump+0xd6/0x840

    Call Trace:
    dump_stack+0xa1/0xea
    lockdep_rcu_suspicious+0x103/0x10d
    fn_trie_dump_leaf+0x581/0x590
    fib_table_dump+0x15f/0x220
    inet_dump_fib+0x4ad/0x5d0
    netlink_dump+0x350/0x840
    __netlink_dump_start+0x315/0x3e0
    rtnetlink_rcv_msg+0x4d1/0x720
    netlink_rcv_skb+0xf0/0x220
    rtnetlink_rcv+0x15/0x20
    netlink_unicast+0x306/0x460
    netlink_sendmsg+0x44b/0x770
    __sys_sendto+0x259/0x270
    __x64_sys_sendto+0x80/0xa0
    do_syscall_64+0x69/0xf4
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    Fixes: 18a8021a7be3 ("net/ipv4: Plumb support for filtering route dumps")
    Signed-off-by: Qian Cai
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Qian Cai
     

22 Nov, 2019

1 commit


27 Oct, 2019

1 commit

  • Since commit af4d768ad28c ("net/ipv4: Add support for specifying metric
    of connected routes"), when updating an IP address with a different metric,
    the associated connected route is updated, too.

    Still, the mentioned commit doesn't handle properly some corner cases:

    $ ip addr add dev eth0 192.168.1.0/24
    $ ip addr add dev eth0 192.168.2.1/32 peer 192.168.2.2
    $ ip addr add dev eth0 192.168.3.1/24
    $ ip addr change dev eth0 192.168.1.0/24 metric 10
    $ ip addr change dev eth0 192.168.2.1/32 peer 192.168.2.2 metric 10
    $ ip addr change dev eth0 192.168.3.1/24 metric 10
    $ ip -4 route
    192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.0
    192.168.2.2 dev eth0 proto kernel scope link src 192.168.2.1
    192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.2.1 metric 10

    Only the last route is correctly updated.

    The problem is the current test in fib_modify_prefix_metric():

    if (!(dev->flags & IFF_UP) ||
    ifa->ifa_flags & (IFA_F_SECONDARY | IFA_F_NOPREFIXROUTE) ||
    ipv4_is_zeronet(prefix) ||
    prefix == ifa->ifa_local || ifa->ifa_prefixlen == 32)

    Which should be the logical 'not' of the pre-existing test in
    fib_add_ifaddr():

    if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) &&
    (prefix != addr || ifa->ifa_prefixlen < 32))

    To properly negate the original expression, we need to change the last
    logical 'or' to a logical 'and'.

    Fixes: af4d768ad28c ("net/ipv4: Add support for specifying metric of connected routes")
    Reported-and-suggested-by: Beniamino Galvani
    Signed-off-by: Paolo Abeni
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Paolo Abeni
     

10 Aug, 2019

1 commit


18 Jul, 2019

1 commit

  • In a rare case where we redirect local packets from veth to lo,
    these packets fail to pass the source validation when rp_filter
    is turned on, as the tracing shows:

    -311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
    -311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130

    So, the fib table lookup returns eth0 as the nexthop even though
    the packets are local and should be routed to loopback nonetheless,
    but they can't pass the dev match check in fib_info_nh_uses_dev()
    without this patch.

    It should be safe to relax this check for this special case, as
    normally packets coming out of loopback device still have skb_dst
    so they won't even hit this slow path.

    Cc: Julian Anastasov
    Cc: David Ahern
    Signed-off-by: Cong Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Cong Wang
     

25 Jun, 2019

2 commits

  • This functionally reverts the check introduced by commit
    e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking")
    as modified by commit e4e92fb160d7 ("net/ipv4: Bail early if user only
    wants prefix entries").

    As we are preparing to fix listing of IPv4 cached routes, we need to
    give userspace a way to request them.

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • The following patches add back the ability to dump IPv4 and IPv6 exception
    routes, and we need to allow selection of regular routes or exceptions.

    Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions:
    iproute2 passes it in dump requests (except for IPv6 cache flush requests,
    this will be fixed in iproute2) and this used to work as long as
    exceptions were stored directly in the FIB, for both IPv4 and IPv6.

    Caveat: if strict checking is not requested (that is, if the dump request
    doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol,
    tables or route types.

    In this case, filtering on RTM_F_CLONED would be inconsistent: we would
    fix 'ip route list cache' by returning exception routes and at the same
    time introduce another bug in case another selector is present, e.g. on
    'ip route list cache table main' we would return all exception routes,
    without filtering on tables.

    Keep this consistent by applying no filters at all, and dumping both
    routes and exceptions, if strict checking is not requested. iproute2
    currently filters results anyway, and no unwanted results will be
    presented to the user. The kernel will just dump more data than needed.

    v7: No changes

    v6: Rebase onto net-next, no changes

    v5: New patch: add dump_routes and dump_exceptions flags in filter and
    simply clear the unwanted one if strict checking is enabled, don't
    ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set.
    Skip filtering altogether if no strict checking is requested:
    selecting routes or exceptions only would be inconsistent with the
    fact we can't filter on tables.

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

11 Jun, 2019

1 commit

  • Add support for RTA_NH_ID attribute to allow a user to specify a
    nexthop id to use with a route. fc_nh_id is added to fib_config to
    hold the value passed in the RTA_NH_ID attribute. If a nexthop id
    is given, the gateway, device, encap and multipath attributes can
    not be set.

    Update fib_nh_match to check ids on a route delete.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

08 Jun, 2019

1 commit


05 Jun, 2019

2 commits

  • Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
    to use a fib6_nh based nexthop. In the end, only code not using a
    nexthop object in a fib_info should directly access fib_nh in a fib_info
    without checking the famiy and going through fib_nh_common. Those
    functions will be marked when it is not directly evident.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
    fib_dev macro which is an alias for the first nexthop. Replacements:

    fi->fib_dev --> fib_info_nh(fi, 0)->fib_nh_dev
    fi->fib_nh --> fib_info_nh(fi, 0)
    fi->fib_nh[i] --> fib_info_nh(fi, i)
    fi->fib_nhs --> fib_info_num_path(fi)

    where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
    returns fi->fib_nhs.

    Move the existing fib_info_nhc to nexthop.h and define the new ones
    there. A later patch adds a check if a fib_info uses a nexthop object,
    and defining the helpers in nexthop.h avoid circular header
    dependencies.

    After this all remaining open coded references to fi->fib_nhs and
    fi->fib_nh are in:
    - fib_create_info and helpers used to lookup an existing fib_info
    entry, and
    - the netdev event functions fib_sync_down_dev and fib_sync_up.

    The latter two will not be reused for nexthops, and the fib_create_info
    will be updated to handle a nexthop in a fib_info.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

23 May, 2019

2 commits

  • New userspace on an older kernel can send unknown and unsupported
    attributes resulting in an incompelete config which is almost
    always wrong for routing (few exceptions are passthrough settings
    like the protocol that installed the route).

    Set strict_start_type in the policies for IPv4 and IPv6 routes and
    rules to detect new, unsupported attributes and fail the route add.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • As nexthops are deleted, fib entries referencing it are marked dead.
    Export fib_flush so those entries can be removed in a timely manner.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

28 Apr, 2019

1 commit

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

11 Apr, 2019

1 commit

  • Govindarajulu reported a regression with Network Manager which sends an
    RTA_GATEWAY attribute with the address set to 0. Fixup the handling of
    RTA_GATEWAY to only set fc_gw_family if the gateway address is actually
    set.

    Fixes: f35b794b3b405 ("ipv4: Prepare fib_config for IPv6 gateway")
    Reported-by: Govindarajulu Varadarajan
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

09 Apr, 2019

2 commits

  • Add support for RTA_VIA and allow an IPv6 nexthop for v4 routes:
    $ ip ro add 172.16.1.0/24 via inet6 2001:db8::1 dev eth0
    $ ip ro ls
    ...
    172.16.1.0/24 via inet6 2001:db8::1 dev eth0

    For convenience and simplicity, userspace can use RTA_VIA to specify
    AF_INET or AF_INET6 gateway.

    The common fib_nexthop_info dump function compares the gateway address
    family to the nh_common family to know if the gateway should be encoded
    as RTA_VIA or RTA_GATEWAY.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Similar to rtable, fib_config needs to allow the gateway to be either an
    IPv4 or an IPv6 address. To that end, rename fc_gw to fc_gw4 to mean an
    IPv4 address and add fc_gw_family. Checks on 'is a gateway set' are changed
    to see if fc_gw_family is set. In the process prepare the code for a
    fc_gw_family == AF_INET6.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

04 Apr, 2019

1 commit

  • Most of the ipv4 code only needs data from fib_nh_common. Add
    fib_nh_common selection to fib_result and update users to use it.

    Right now, fib_nh_common in fib_result will point to a fib_nh struct
    that is embedded within a fib_info:

    fib_info --> fib_nh
    fib_nh
    ...
    fib_nh
    ^
    fib_result->nhc ----+

    Later, nhc can point to a fib_nh within a nexthop struct:

    fib_info --> nexthop --> fib_nh
    ^
    fib_result->nhc ---------------+

    or for a nexthop group:

    fib_info --> nexthop --> nexthop --> fib_nh
    nexthop --> fib_nh
    ...
    nexthop --> fib_nh
    ^
    fib_result->nhc ---------------------------+

    In all cases nhsel within fib_result will point to which leg in the
    multipath route is used.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

30 Mar, 2019

1 commit

  • Rename fib_nh entries that will be moved to a fib_nh_common struct.
    Specifically, the device, oif, gateway, flags, scope, lwtstate,
    nh_weight and nh_upper_bound are common with all nexthop definitions.
    In the process shorten fib_nh_lwtstate to fib_nh_lws to avoid really
    long lines.

    Rename only; no functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

27 Feb, 2019

1 commit

  • IPv4 currently does not support nexthops outside of the AF_INET family.
    Specifically, it does not handle RTA_VIA attribute. If it is passed
    in a route add request, the actual route added only uses the device
    which is clearly not what the user intended:

    $ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
    $ ip ro ls
    ...
    172.16.1.0/24 dev eth0

    Catch this and fail the route add:
    $ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
    Error: IPv4 does not support RTA_VIA attribute.

    Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

16 Jan, 2019

1 commit

  • IPv4 routing tables are flushed in two cases:

    1. In response to events in the netdev and inetaddr notification chains
    2. When a network namespace is being dismantled

    In both cases only routes associated with a dead nexthop group are
    flushed. However, a nexthop group will only be marked as dead in case it
    is populated with actual nexthops using a nexthop device. This is not
    the case when the route in question is an error route (e.g.,
    'blackhole', 'unreachable').

    Therefore, when a network namespace is being dismantled such routes are
    not flushed and leaked [1].

    To reproduce:
    # ip netns add blue
    # ip -n blue route add unreachable 192.0.2.0/24
    # ip netns del blue

    Fix this by not skipping error routes that are not marked with
    RTNH_F_DEAD when flushing the routing tables.

    To prevent the flushing of such routes in case #1, add a parameter to
    fib_table_flush() that indicates if the table is flushed as part of
    namespace dismantle or not.

    Note that this problem does not exist in IPv6 since error routes are
    associated with the loopback device.

    [1]
    unreferenced object 0xffff888066650338 (size 56):
    comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff ..........ba....
    e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00 ...d............
    backtrace:
    [] inet_rtm_newroute+0x129/0x220
    [] rtnetlink_rcv_msg+0x397/0xa20
    [] netlink_rcv_skb+0x132/0x380
    [] netlink_unicast+0x4c0/0x690
    [] netlink_sendmsg+0x929/0xe10
    [] sock_sendmsg+0xc8/0x110
    [] ___sys_sendmsg+0x77a/0x8f0
    [] __sys_sendmsg+0xf7/0x250
    [] do_syscall_64+0x14d/0x610
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff
    unreferenced object 0xffff888061621c88 (size 48):
    comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
    hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff kkkkkkkk..&_....
    backtrace:
    [] fib_table_insert+0x978/0x1500
    [] inet_rtm_newroute+0x129/0x220
    [] rtnetlink_rcv_msg+0x397/0xa20
    [] netlink_rcv_skb+0x132/0x380
    [] netlink_unicast+0x4c0/0x690
    [] netlink_sendmsg+0x929/0xe10
    [] sock_sendmsg+0xc8/0x110
    [] ___sys_sendmsg+0x77a/0x8f0
    [] __sys_sendmsg+0xf7/0x250
    [] do_syscall_64+0x14d/0x610
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff

    Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.")
    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

25 Oct, 2018

1 commit

  • When doing a route dump across all address families, do not error out
    if the table does not exist. This allows a route dump for AF_UNSPEC
    with a table id that may only exist for some of the families.

    Do return the table does not exist error if dumping routes for a
    specific family and the table does not exist.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

16 Oct, 2018

4 commits

  • Unlike IPv6, IPv4 does not have routes marked with RTF_PREFIX_RT. If the
    flag is set in the dump request, just return.

    In the process of this change, move the CLONE check to use the new
    filter flags.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Update parsing of route dump request to enable kernel side filtering.
    Allow filtering results by protocol (e.g., which routing daemon installed
    the route), route type (e.g., unicast), table id and nexthop device. These
    amount to the low hanging fruit, yet a huge improvement, for dumping
    routes.

    ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can
    be used to look up the device index without taking a reference. From
    there filter->dev is only used during dump loops with the lock still held.

    Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results
    have been filtered should no entries be returned.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Implement kernel side filtering of routes by table id, egress device index,
    protocol and route type. If the table id is given in the filter, lookup the
    table and call fib_table_dump directly for it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add struct fib_dump_filter for options on limiting which routes are
    returned in a dump request. The current list is table id, protocol,
    route type, rtm_flags and nexthop device index. struct net is needed
    to lookup the net_device from the index.

    Declare the filter for each route dump handler and plumb the new
    arguments from dump handlers to ip_valid_fib_dump_req.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

13 Oct, 2018

1 commit


11 Oct, 2018

1 commit

  • Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop
    exceptions"), exceptions get deprecated separately from cached
    routes. In particular, administrative changes don't clear PMTU anymore.

    As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes
    on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
    the local MTU change can become stale:
    - if the local MTU is now lower than the PMTU, that PMTU is now
    incorrect
    - if the local MTU was the lowest value in the path, and is increased,
    we might discover a higher PMTU

    Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those
    cases.

    If the exception was locked, the discovered PMTU was smaller than the
    minimal accepted PMTU. In that case, if the new local MTU is smaller
    than the current PMTU, let PMTU discovery figure out if locking of the
    exception is still needed.

    To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
    notifier. By the time the notifier is called, dev->mtu has been
    changed. This patch adds the old MTU as additional information in the
    notifier structure, and a new call_netdevice_notifiers_u32() function.

    Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

09 Oct, 2018

1 commit

  • Add helper to check netlink message for route dumps. If the strict flag
    is set the dump request is expected to have an rtmsg struct as the header.
    All elements of the struct are expected to be 0 with the exception of
    rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
    can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
    set.

    Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
    and ip6mr_rtm_dumproute to call this helper if strict data checking is
    enabled.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

22 Sep, 2018

1 commit

  • net/ipv4/fib_frontend.c: In function 'fib_info_nh_uses_dev':
    net/ipv4/fib_frontend.c:322:6: error: unused variable 'ret' [-Werror=unused-variable]
    cc1: all warnings being treated as errors

    Fixes: 78f2756c5fc0 ("net/ipv4: Move device validation to helper")
    Signed-off-by: Eric Dumazet
    Cc: David Ahern
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Sep, 2018

1 commit


29 Jul, 2018

1 commit

  • Remove BUG_ON() from fib_compute_spec_dst routine and check
    in_dev pointer during flowi4 data structure initialization.
    fib_compute_spec_dst routine can be run concurrently with device removal
    where ip_ptr net_device pointer is set to NULL. This can happen
    if userspace enables pkt info on UDP rx socket and the device
    is removed while traffic is flowing

    Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper")
    Signed-off-by: Lorenzo Bianconi
    Signed-off-by: David S. Miller

    Lorenzo Bianconi