27 Aug, 2020

1 commit

  • fib_info_notify_update() is always called with RTNL held, but not from
    an RCU read-side critical section. This leads to the following warning
    [1] when the FIB table list is traversed with
    hlist_for_each_entry_rcu(), but without a proper lockdep expression.

    Since modification of the list is protected by RTNL, silence the warning
    by adding a lockdep expression which verifies RTNL is held.

    [1]
    =============================
    WARNING: suspicious RCU usage
    5.9.0-rc1-custom-14233-g2f26e122d62f #129 Not tainted
    -----------------------------
    net/ipv4/fib_trie.c:2124 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ip/834:
    #0: ffffffff85a3b6b0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0

    stack backtrace:
    CPU: 0 PID: 834 Comm: ip Not tainted 5.9.0-rc1-custom-14233-g2f26e122d62f #129
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
    Call Trace:
    dump_stack+0x100/0x184
    lockdep_rcu_suspicious+0x143/0x14d
    fib_info_notify_update+0x8d1/0xa60
    __nexthop_replace_notify+0xd2/0x290
    rtm_new_nexthop+0x35e2/0x5946
    rtnetlink_rcv_msg+0x4f7/0xbd0
    netlink_rcv_skb+0x17a/0x480
    rtnetlink_rcv+0x22/0x30
    netlink_unicast+0x5ae/0x890
    netlink_sendmsg+0x98a/0xf40
    ____sys_sendmsg+0x879/0xa00
    ___sys_sendmsg+0x122/0x190
    __sys_sendmsg+0x103/0x1d0
    __x64_sys_sendmsg+0x7d/0xb0
    do_syscall_64+0x32/0x50
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fde28c3be57
    Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 51
    c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
    RSP: 002b:00007ffc09330028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fde28c3be57
    RDX: 0000000000000000 RSI: 00007ffc09330090 RDI: 0000000000000003
    RBP: 000000005f45f911 R08: 0000000000000001 R09: 00007ffc0933012c
    R10: 0000000000000076 R11: 0000000000000246 R12: 0000000000000001
    R13: 00007ffc09330290 R14: 00007ffc09330eee R15: 00005610e48ed020

    Fixes: 1bff1a0c9bbd ("ipv4: Add function to send route updates")
    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

02 Aug, 2020

1 commit


30 Jul, 2020

1 commit

  • fib_trie_unmerge() is called with RTNL held, but not from an RCU
    read-side critical section. This leads to the following warning [1] when
    the FIB alias list in a leaf is traversed with
    hlist_for_each_entry_rcu().

    Since the function is always called with RTNL held and since
    modification of the list is protected by RTNL, simply use
    hlist_for_each_entry() and silence the warning.

    [1]
    WARNING: suspicious RCU usage
    5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted
    -----------------------------
    net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ip/164:
    #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0

    stack backtrace:
    CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
    Call Trace:
    dump_stack+0x100/0x184
    lockdep_rcu_suspicious+0x153/0x15d
    fib_trie_unmerge+0x608/0xdb0
    fib_unmerge+0x44/0x360
    fib4_rule_configure+0xc8/0xad0
    fib_nl_newrule+0x37a/0x1dd0
    rtnetlink_rcv_msg+0x4f7/0xbd0
    netlink_rcv_skb+0x17a/0x480
    rtnetlink_rcv+0x22/0x30
    netlink_unicast+0x5ae/0x890
    netlink_sendmsg+0x98a/0xf40
    ____sys_sendmsg+0x879/0xa00
    ___sys_sendmsg+0x122/0x190
    __sys_sendmsg+0x103/0x1d0
    __x64_sys_sendmsg+0x7d/0xb0
    do_syscall_64+0x54/0xa0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fc80a234e97
    Code: Bad RIP value.
    RSP: 002b:00007ffef8b66798 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc80a234e97
    RDX: 0000000000000000 RSI: 00007ffef8b66800 RDI: 0000000000000003
    RBP: 000000005f141b1c R08: 0000000000000001 R09: 0000000000000000
    R10: 00007fc80a2a8ac0 R11: 0000000000000246 R12: 0000000000000001
    R13: 0000000000000000 R14: 00007ffef8b67008 R15: 0000556fccb10020

    Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

07 Jul, 2020

1 commit

  • Rationale:
    Reduces attack surface on kernel devs opening the links for MITM
    as HTTPS traffic is much harder to manipulate.

    Deterministic algorithm:
    For each file:
    If not .svg:
    For each line:
    If doesn't contain `\bxmlns\b`:
    For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
    If both the HTTP and HTTPS versions
    return 200 OK and serve the same content:
    Replace HTTP with HTTPS.

    Signed-off-by: Alexander A. Klimov
    Signed-off-by: David S. Miller

    Alexander A. Klimov
     

27 May, 2020

1 commit

  • FIB lookups can return an entry that references an external nexthop.
    While walking the nexthop struct we do not want to make multiple calls
    into the nexthop code which can result in 2 different structs getting
    accessed - one returning the number of paths the rest of the loop
    seeing a different nh_grp struct. If the nexthop group shrunk, the
    result is an attempt to access a fib_nh_common that does not exist for
    the new nh_grp struct but did for the old one.

    To fix that move the device evaluation code to a helper that can be
    used for inline fib_nh path as well as external nexthops.

    Update the existing check for fi->nh in fib_table_lookup to call a
    new helper, nexthop_get_nhc_lookup, which walks the external nexthop
    with a single rcu dereference.

    Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
    Signed-off-by: David Ahern
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

31 Mar, 2020

1 commit


30 Mar, 2020

2 commits

  • The build_state callback of lwtunnel doesn't contain the net namespace
    structure yet. This patch will add it so we can check on specific
    address configuration at creation time of rpl source routes.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • fib_triestat_seq_show() calls hlist_for_each_entry_rcu(tb, head,
    tb_hlist) without rcu_read_lock() will trigger a warning,

    net/ipv4/fib_trie.c:2579 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by proc01/115277:
    #0: c0000014507acf00 (&p->lock){+.+.}-{3:3}, at: seq_read+0x58/0x670

    Call Trace:
    dump_stack+0xf4/0x164 (unreliable)
    lockdep_rcu_suspicious+0x140/0x164
    fib_triestat_seq_show+0x750/0x880
    seq_read+0x1a0/0x670
    proc_reg_read+0x10c/0x1b0
    __vfs_read+0x3c/0x70
    vfs_read+0xac/0x170
    ksys_read+0x7c/0x140
    system_call+0x5c/0x68

    Fix it by adding a pair of rcu_read_lock/unlock() and use
    cond_resched_rcu() to avoid the situation where walking of a large
    number of items may prevent scheduling for a long time.

    Signed-off-by: Qian Cai
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Qian Cai
     

21 Feb, 2020

1 commit


20 Jan, 2020

1 commit


15 Jan, 2020

3 commits

  • When performing L3 offload, routes and nexthops are usually programmed
    into two different tables in the underlying device. Therefore, the fact
    that a nexthop resides in hardware does not necessarily mean that all
    the associated routes also reside in hardware and vice-versa.

    While the kernel can signal to user space the presence of a nexthop in
    hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
    for routes. In addition, the fact that a route resides in hardware does
    not necessarily mean that the traffic is offloaded. For example,
    unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
    packets to the CPU so that the kernel will be able to generate the
    appropriate ICMP error packet.

    This patch adds an "offload" and "trap" indications to IPv4 routes, so
    that users will have better visibility into the offload process.

    'struct fib_alias' is extended with two new fields that indicate if the
    route resides in hardware or not and if it is offloading traffic from
    the kernel or trapping packets to it. Note that the new fields are added
    in the 6 bytes hole and therefore the struct still fits in a single
    cache line [1].

    Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
    route's key in order to set the flags.

    The indications are dumped to user space via a new flags (i.e.,
    'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
    ancillary header.

    v2:
    * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

    [1]
    struct fib_alias {
    struct hlist_node fa_list; /* 0 16 */
    struct fib_info * fa_info; /* 16 8 */
    u8 fa_tos; /* 24 1 */
    u8 fa_type; /* 25 1 */
    u8 fa_state; /* 26 1 */
    u8 fa_slen; /* 27 1 */
    u32 tb_id; /* 28 4 */
    s16 fa_default; /* 32 2 */
    u8 offload:1; /* 34: 0 1 */
    u8 trap:1; /* 34: 1 1 */
    u8 unused:6; /* 34: 2 1 */

    /* XXX 5 bytes hole, try to pack */

    struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */

    /* size: 56, cachelines: 1, members: 12 */
    /* sum members: 50, holes: 1, sum holes: 5 */
    /* sum bitfield members: 8 bits (1 bytes) */
    /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
    /* last cacheline: 56 bytes */
    } __attribute__((__aligned__(8)));

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
    using the passed arguments. Currently, the function takes 11 arguments,
    6 of which are attributes of the route being dumped (e.g., prefix, TOS).

    The next patch will need the function to also dump to user space an
    indication if the route is present in hardware or not. Instead of
    passing yet another argument, change the function to take a struct
    containing the different route attributes.

    v2:
    * Name last argument of fib_dump_info()
    * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
    later be passed to fib_alias_hw_flags_set()

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Subsequent patches will add an offload / trap indication to routes which
    will signal if the route is present in hardware or not.

    After programming the route to the hardware, drivers will have to ask
    the IPv4 code to set the flags by passing the route's key.

    In the case of route replace, the new route is notified before it is
    actually inserted into the FIB alias list. This can prevent simple
    drivers (e.g., netdevsim) that program the route to the hardware in the
    same context it is notified in from being able to set the flag.

    Solve this by first inserting the new route to the list and rollback the
    operation in case the route was vetoed.

    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

11 Jan, 2020

1 commit

  • Sven-Haegar reported looping on fib dumps when 255.255.255.255 route has
    been added to a table. The looping is caused by the key rolling over from
    FFFFFFFF to 0. When dumping a specific table only, we need a means to detect
    when the table dump is done. The key and count saved to cb args are both 0
    only at the start of the table dump. If key is 0 and count > 0, then we are
    in the rollover case. Detect and return to avoid looping.

    This only affects dumps of a specific table; for dumps of all tables
    (the case prior to the change in the Fixes tag) inet_dump_fib moved
    the entry counter to the next table and reset the cb args used by
    fib_table_dump and fn_trie_dump_leaf, so the rollover ffffffff back
    to 0 did not cause looping with the dumps.

    Fixes: effe67926624 ("net: Enable kernel side filtering of route dumps")
    Reported-by: Sven-Haegar Koch
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

17 Dec, 2019

8 commits

  • Unlike mlxsw, the other listeners to the FIB notification chain do not
    require any special modifications as they never considered multiple
    identical routes.

    This patch removes the old route notifications and converts all the
    listeners to use the new replace / delete notifications.

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a new listener is registered to the FIB notification chain it
    receives a dump of all the available routes in the system. Instead, make
    sure to only replay the IPv4 routes that are actually used in the data
    path and are of any interest to the new listener.

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • In a similar fashion to previous patch, when a route is deleted as part
    of table flushing, promote the next route in the list, if exists.
    Otherwise, simply emit a delete notification.

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a route is deleted we potentially need to promote the next route in
    the FIB alias list (e.g., with an higher metric). In case we find such a
    route, a replace notification is emitted. Otherwise, a delete
    notification for the deleted route.

    v2:
    * Convert to use fib_find_alias() instead of fib_find_first_alias()

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When a route is added, it should only be notified in case it is the
    first route in the FIB alias list with the given {prefix, prefix length,
    table ID}. Otherwise, it is not used in the data path and should not be
    considered by switch drivers.

    v2:
    * Convert to use fib_find_alias() instead of fib_find_first_alias()

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • When replacing a route, its replacement should only be notified in case
    the replaced route is of any interest to listeners. In other words, if
    the replaced route is currently used in the data path, which means it is
    the first route in the FIB alias list with the given {prefix, prefix
    length, table ID}.

    v2:
    * Convert to use fib_find_alias() instead of fib_find_first_alias()

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Extend the function with another argument, 'find_first'. When set, the
    function returns the first FIB alias with the matching {prefix, prefix
    length, table ID}. The TOS and priority parameters are ignored. Current
    callers are converted to pass 'false' in order to maintain existing
    behavior.

    This will be used by subsequent patches in the series.

    v2:
    * New patch

    Signed-off-by: Ido Schimmel
    Suggested-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Currently, a new route is notified in the FIB notification chain before
    it is inserted to the FIB alias list.

    Subsequent patches will use the placement of the new route in the
    ordered FIB alias list in order to determine if the route should be
    notified or not.

    As a preparatory step, change the order so that the route is first
    inserted into the FIB alias list and only then notified.

    Signed-off-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

05 Oct, 2019

3 commits


25 Aug, 2019

1 commit

  • An excerpt from netlink(7) man page,

    In multipart messages (multiple nlmsghdr headers with associated payload
    in one byte stream) the first and all following headers have the
    NLM_F_MULTI flag set, except for the last header which has the type
    NLMSG_DONE.

    but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
    FIB dump. The result is user space applications following above man page
    excerpt may get confused and may stop parsing msg believing something went
    wrong.

    In the golang netlink lib [0] the library logic stops parsing believing the
    message is not a multipart message. Found this running Cilium[1] against
    net-next while adding a feature to auto-detect routes. I noticed with
    multiple route tables we no longer could detect the default routes on net
    tree kernels because the library logic was not returning them.

    Fix this by handling the fib_dump_info_fnhe() case the same way the
    fib_dump_info() handles it by passing the flags argument through the
    call chain and adding a flags argument to rt_fill_info().

    Tested with Cilium stack and auto-detection of routes works again. Also
    annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
    NLMSG_DONE flags look correct after this.

    Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
    as is done for fib_dump_info() so this looks correct to me.

    [0] https://github.com/vishvananda/netlink/
    [1] https://github.com/cilium/

    Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested")
    Signed-off-by: John Fastabend
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller

    John Fastabend
     

03 Jul, 2019

1 commit

  • In commit ee28906fd7a1 ("ipv4: Dump route exceptions if requested") I
    added a counter of per-node dumped routes (including actual routes and
    exceptions), analogous to the existing counter for dumped nodes. Dumping
    exceptions means we need to also keep track of how many routes are dumped
    for each node: this would be just one route per node, without exceptions.

    When netlink strict checking is not enabled, we dump both routes and
    exceptions at the same time: the RTM_F_CLONED flag is not used as a
    filter. In this case, the per-node counter 'i_fa' is incremented by one
    to track the single dumped route, then also incremented by one for each
    exception dumped, and then stored as netlink callback argument as skip
    counter, 's_fa', to be used when a partial dump operation restarts.

    The per-node counter needs to be increased by one also when we skip a
    route (exception) due to a previous non-zero skip counter, because it
    needs to match the existing skip counter, if we are dumping both routes
    and exceptions. I missed this, and only incremented the counter, for
    regular routes, if the previous skip counter was zero. This means that,
    in case of a mixed dump, partial dump operations after the first one
    will start with a mismatching skip counter value, one less than expected.

    This means in turn that the first exception for a given node is skipped
    every time a partial dump operation restarts, if netlink strict checking
    is not enabled (iproute < 5.0).

    It turns out I didn't repeat the test in its final version, commit
    de755a85130e ("selftests: pmtu: Introduce list_flush_ipv4_exception test
    case"), which also counts the number of route exceptions returned, with
    iproute2 versions < 5.0 -- I was instead using the equivalent of the IPv6
    test as it was before commit b964641e9925 ("selftests: pmtu: Make
    list_flush_ipv6_exception test more demanding").

    Always increment the per-node counter by one if we previously dumped
    a regular route, so that it matches the current skip counter.

    Fixes: ee28906fd7a1 ("ipv4: Dump route exceptions if requested")
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Stefano Brivio
     

25 Jun, 2019

1 commit

  • Since commit 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions."), cached
    exception routes are stored as a separate entity, so they are not dumped
    on a FIB dump, even if the RTM_F_CLONED flag is passed.

    This implies that the command 'ip route list cache' doesn't return any
    result anymore.

    If the RTM_F_CLONED is passed, and strict checking requested, retrieve
    nexthop exception routes and dump them. If no strict checking is
    requested, filtering can't be performed consistently: dump everything in
    that case.

    With this, we need to add an argument to the netlink callback in order to
    track how many entries were already dumped for the last leaf included in
    a partial netlink dump.

    A single additional argument is sufficient, even if we traverse logically
    nested structures (nexthop objects, hash table buckets, bucket chains): it
    doesn't matter if we stop in the middle of any of those, because they are
    always traversed the same way. As an example, s_i values in [], s_fa
    values in ():

    node (fa) #1 [1]
    nexthop #1
    bucket #1 -> #0 in chain (1)
    bucket #2 -> #0 in chain (2) -> #1 in chain (3) -> #2 in chain (4)
    bucket #3 -> #0 in chain (5) -> #1 in chain (6)

    nexthop #2
    bucket #1 -> #0 in chain (7) -> #1 in chain (8)
    bucket #2 -> #0 in chain (9)
    --
    node (fa) #2 [2]
    nexthop #1
    bucket #1 -> #0 in chain (1) -> #1 in chain (2)
    bucket #2 -> #0 in chain (3)

    it doesn't matter if we stop at (3), (4), (7) for "node #1", or at (2)
    for "node #2": walking flattens all that.

    It would even be possible to drop the distinction between the in-tree
    (s_i) and in-node (s_fa) counter, but a further improvement might
    advise against this. This is only as accurate as the existing tracking
    mechanism for leaves: if a partial dump is restarted after exceptions
    are removed or expired, we might skip some non-dumped entries.

    To improve this, we could attach a 'sernum' attribute (similar to the
    one used for IPv6) to nexthop entities, and bump this counter whenever
    exceptions change: having a distinction between the two counters would
    make this more convenient.

    Listing of exception routes (modified routes pre-3.5) was tested against
    these versions of kernel and iproute2:

    iproute2
    kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0
    3.5-rc4 + + + + +
    4.4
    4.9
    4.14
    4.15
    4.19
    5.0
    5.1
    fixed + + + + +

    v7:
    - Move loop over nexthop objects to route.c, and pass struct fib_info
    and table ID to it, not a struct fib_alias (suggested by David Ahern)
    - While at it, note that the NULL check on fa->fa_info is redundant,
    and the check on RTNH_F_DEAD is also not consistent with what's done
    with regular route listing: just keep it for nhc_flags
    - Rename entry point function for dumping exceptions to
    fib_dump_info_fnhe(), and rearrange arguments for consistency with
    fib_dump_info()
    - Rename fnhe_dump_buckets() to fnhe_dump_bucket() and make it handle
    one bucket at a time
    - Expand commit message to describe why we can have a single "skip"
    counter for all exceptions stored in bucket chains in nexthop objects
    (suggested by David Ahern)

    v6:
    - Rebased onto net-next
    - Loop over nexthop paths too. Move loop over fnhe buckets to route.c,
    avoids need to export rt_fill_info() and to touch exceptions from
    fib_trie.c. Pass NULL as flow to rt_fill_info(), it now allows that
    (suggested by David Ahern)

    Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

20 Jun, 2019

1 commit

  • empty_child_inc/dec() use the ternary operator for conditional
    operations. The conditions involve the post/pre in/decrement
    operator and the operation is only performed when the condition
    is *not* true. This is hard to parse for humans, use a regular
    'if' construct instead and perform the in/decrement separately.

    This also fixes two warnings that are emitted about the value
    of the ternary expression being unused, when building the kernel
    with clang + "kbuild: Remove unnecessary -Wno-unused-value"
    (https://lore.kernel.org/patchwork/patch/1089869/):

    CC net/ipv4/fib_trie.o
    net/ipv4/fib_trie.c:351:2: error: expression result unused [-Werror,-Wunused-value]
    ++tn_info(n)->empty_children ? : ++tn_info(n)->full_children;

    Fixes: 95f60ea3e99a ("fib_trie: Add collapse() and should_collapse() to resize")
    Signed-off-by: Matthias Kaehlcke
    Reviewed-by: Douglas Anderson
    Reviewed-by: Nick Desaulniers
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Matthias Kaehlcke
     

08 Jun, 2019

1 commit


05 Jun, 2019

3 commits

  • Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
    fib_info side of the nexthop fib_info relationship.

    Add fi_list list_head to 'struct nexthop' to track fib_info entries
    using a nexthop instance. Add __remove_nexthop_fib and add it to
    __remove_nexthop to walk the new list_head and mark those fib entries
    as dead when the nexthop is deleted.

    Add a few nexthop helpers for use when a nexthop is added to fib_info:
    - nexthop_cmp to determine if 2 nexthops are the same
    - nexthop_path_fib_result to select a path for a multipath
    'struct nexthop'
    - nexthop_fib_nhc to select a specific fib_nh_common within a
    multipath 'struct nexthop'

    Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
    a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
    case.

    Update the fib_info functions to check for fi->nh and take a different
    path as needed:
    - free_fib_info_rcu - put the nexthop object reference
    - fib_release_info - remove the fib_info from the nexthop's fi_list
    - nh_comp - use nexthop_cmp when either fib_info references a nexthop
    object
    - fib_info_hashfn - use the nexthop id for the hashing vs the oif of
    each fib_nh in a fib_info
    - fib_nlmsg_size - add space for the RTA_NH_ID attribute
    - fib_create_info - verify nexthop reference can be taken, verify
    nexthop spec is valid for fib entry, and add fib_info to fi_list for
    a nexthop
    - fib_select_multipath - use the new nexthop_path_fib_result to select a
    path when nexthop objects are used
    - fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
    it the same as a fib entry using 'blackhole'

    The bulk of the changes are in fib_semantics.c and most of that is
    moving the existing change_nexthops into an else branch.

    Update the nexthop code to walk fi_list on a nexthop deleted to remove
    fib entries referencing it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
    to use a fib6_nh based nexthop. In the end, only code not using a
    nexthop object in a fib_info should directly access fib_nh in a fib_info
    without checking the famiy and going through fib_nh_common. Those
    functions will be marked when it is not directly evident.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
    fib_dev macro which is an alias for the first nexthop. Replacements:

    fi->fib_dev --> fib_info_nh(fi, 0)->fib_nh_dev
    fi->fib_nh --> fib_info_nh(fi, 0)
    fi->fib_nh[i] --> fib_info_nh(fi, i)
    fi->fib_nhs --> fib_info_num_path(fi)

    where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
    returns fi->fib_nhs.

    Move the existing fib_info_nhc to nexthop.h and define the new ones
    there. A later patch adds a check if a fib_info uses a nexthop object,
    and defining the helpers in nexthop.h avoid circular header
    dependencies.

    After this all remaining open coded references to fi->fib_nhs and
    fi->fib_nh are in:
    - fib_create_info and helpers used to lookup an existing fib_info
    entry, and
    - the netdev event functions fib_sync_down_dev and fib_sync_up.

    The latter two will not be reused for nexthops, and the fib_create_info
    will be updated to handle a nexthop in a fib_info.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

27 May, 2019

1 commit

  • The pointer n is being assigned a value however this value is
    never read in the code block and the end of the code block
    continues to the next loop iteration. Clean up the code by
    removing the redundant assignment.

    Fixes: 1bff1a0c9bbda ("ipv4: Add function to send route updates")
    Addresses-Coverity: ("Unused value")
    Signed-off-by: Colin Ian King
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Colin Ian King
     

23 May, 2019

1 commit

  • Add fib_info_notify_update to walk the fib and send RTM_NEWROUTE
    notifications with NLM_F_REPLACE set for entries linked to a fib_info
    that have nh_updated flag set. This helper will be used by the nexthop
    code to notify userspace of routes that are impacted when a nexthop
    config is updated via replace. The new function and its helper are
    similar to how fib_flush and fib_table_flush work for address delete
    and link down events.

    This notification is needed for legacy apps that do not understand
    the new nexthop object. Apps that are nexthop aware can use the
    RTA_NH_ID attribute in the route notification to just ignore it.

    In the future this should be wrapped in a sysctl to allow OS'es that
    are fully updated to avoid the notificaton storm.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

04 Apr, 2019

2 commits

  • Most of the ipv4 code only needs data from fib_nh_common. Add
    fib_nh_common selection to fib_result and update users to use it.

    Right now, fib_nh_common in fib_result will point to a fib_nh struct
    that is embedded within a fib_info:

    fib_info --> fib_nh
    fib_nh
    ...
    fib_nh
    ^
    fib_result->nhc ----+

    Later, nhc can point to a fib_nh within a nexthop struct:

    fib_info --> nexthop --> fib_nh
    ^
    fib_result->nhc ---------------+

    or for a nexthop group:

    fib_info --> nexthop --> nexthop --> fib_nh
    nexthop --> fib_nh
    ...
    nexthop --> fib_nh
    ^
    fib_result->nhc ---------------------------+

    In all cases nhsel within fib_result will point to which leg in the
    multipath route is used.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Update fib_table_lookup tracepoint to take a fib_nh_common struct and
    dump the v6 gateway address if the nexthop uses it.

    Over the years saddr has not proven useful and the output of the
    tracepoint produces very long lines. Since saddr is not part of
    fib_nh_common, drop it. If it needs to be added later, fib_nh which
    contains saddr can be obtained from a fib_nh_common via container_of.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

30 Mar, 2019

2 commits

  • Rename fib_nh entries that will be moved to a fib_nh_common struct.
    Specifically, the device, oif, gateway, flags, scope, lwtstate,
    nh_weight and nh_upper_bound are common with all nexthop definitions.
    In the process shorten fib_nh_lwtstate to fib_nh_lws to avoid really
    long lines.

    Rename only; no functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • in_dev lookup followed by IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN check
    is called in several places, some with the rcu lock and others with the
    rtnl held.

    Move the check to a helper similar to what IPv6 has. Since the helper
    can be invoked from either context use rcu_dereference_rtnl to
    dereference ip_ptr.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern