03 Dec, 2020

1 commit

  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Link: https://lore.kernel.org/r/1606903122-2098-1-git-send-email-zhangchangzhong@huawei.com
    Signed-off-by: Jakub Kicinski

    Zhang Changzhong
     

01 Dec, 2020

2 commits

  • While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
    that case, copy it over to reduce the chance for additional (re)allocations
    in the transmit path.

    Signed-off-by: Sven Eckelmann
    Link: https://lore.kernel.org/r/20201126125247.1047977-2-sven@narfation.org
    Signed-off-by: Jakub Kicinski

    Sven Eckelmann
     
  • It was observed that sending data via batadv over vxlan (on top of
    wireguard) reduced the performance massively compared to raw ethernet or
    batadv on raw ethernet. A check of perf data showed that the
    vxlan_build_skb was calling all the time pskb_expand_head to allocate
    enough headroom for:

    min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
    + VXLAN_HLEN + iphdr_len;

    But the vxlan_config_apply only requested needed headroom for:

    lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM

    So it completely ignored the needed_headroom of the lower device. The first
    caller of net_dev_xmit could therefore never make sure that enough headroom
    was allocated for the rest of the transmit path.

    Cc: Annika Wickert
    Signed-off-by: Sven Eckelmann
    Tested-by: Annika Wickert
    Link: https://lore.kernel.org/r/20201126125247.1047977-1-sven@narfation.org
    Signed-off-by: Jakub Kicinski

    Sven Eckelmann
     

06 Oct, 2020

1 commit


27 Sep, 2020

1 commit

  • This reverts commit 546c044c9651e81a16833806feff6b369bb5de33.

    Nothing prevents user from sending frames to "external" VxLAN devices.
    In fact kernel itself may generate icmp chatter.

    This is fine, such frames should be dropped.

    The point of the "missing encapsulation" warning was that
    frames with missing encap should not make it into vxlan_xmit_one().
    And vxlan_xmit() drops them cleanly, so let it just do that.

    Without this revert the warning is triggered by the udp_tunnel_nic.sh
    test, but the minimal repro is:

    $ ip link add vxlan0 type vxlan \
    group 239.1.1.1 \
    dev lo \
    dstport 1234 \
    external
    $ ip li set dev vxlan0 up

    [ 419.165981] vxlan0: Missing encapsulation instructions
    [ 419.166551] WARNING: CPU: 0 PID: 1041 at drivers/net/vxlan.c:2889 vxlan_xmit+0x15c0/0x1fc0 [vxlan]

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

26 Sep, 2020

5 commits


06 Aug, 2020

1 commit

  • This reverts commit 71130f29979c7c7956b040673e6b9d5643003176.

    In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to
    make sure the tos value are filtered by RT_TOS() based on RFC1349.

    0 1 2 3 4 5 6 7
    +-----+-----+-----+-----+-----+-----+-----+-----+
    | PRECEDENCE | TOS | MBZ |
    +-----+-----+-----+-----+-----+-----+-----+-----+

    But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like

    0 1 2 3 4 5 6 7
    +-----+-----+-----+-----+-----+-----+-----+-----+
    | DS FIELD, DSCP | ECN FIELD |
    +-----+-----+-----+-----+-----+-----+-----+-----+

    So with

    IPTOS_TOS_MASK 0x1E
    RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)

    the first 3 bits DSCP info will get lost.

    To take all the DSCP info in xmit, we should revert the patch and just push
    all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later.

    Fixes: 71130f29979c ("vxlan: fix tos value before xmit")
    Signed-off-by: Hangbin Liu
    Acked-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Hangbin Liu
     

05 Aug, 2020

2 commits

  • If the interface is a bridge or Open vSwitch port, and we can't
    forward a packet because it exceeds the local PMTU estimate,
    trigger an ICMP or ICMPv6 reply to the sender, using the same
    interface to forward it back.

    If metadata collection is enabled, reverse destination and source
    addresses, so that Open vSwitch is able to match this packet against
    the existing, reverse flow.

    v2: Use netif_is_any_bridge_port() (David Ahern)

    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • It's currently possible to bridge Ethernet tunnels carrying IP
    packets directly to external interfaces without assigning them
    addresses and routes on the bridged network itself: this is the case
    for UDP tunnels bridged with a standard bridge or by Open vSwitch.

    PMTU discovery is currently broken with those configurations, because
    the encapsulation effectively decreases the MTU of the link, and
    while we are able to account for this using PMTU discovery on the
    lower layer, we don't have a way to relay ICMP or ICMPv6 messages
    needed by the sender, because we don't have valid routes to it.

    On the other hand, as a tunnel endpoint, we can't fragment packets
    as a general approach: this is for instance clearly forbidden for
    VXLAN by RFC 7348, section 4.3:

    VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may
    fragment encapsulated VXLAN packets due to the larger frame size.
    The destination VTEP MAY silently discard such VXLAN fragments.

    The same paragraph recommends that the MTU over the physical network
    accomodates for encapsulations, but this isn't a practical option for
    complex topologies, especially for typical Open vSwitch use cases.

    Further, it states that:

    Other techniques like Path MTU discovery (see [RFC1191] and
    [RFC1981]) MAY be used to address this requirement as well.

    Now, PMTU discovery already works for routed interfaces, we get
    route exceptions created by the encapsulation device as they receive
    ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
    we already rebuild those messages with the appropriate MTU and route
    them back to the sender.

    Add the missing bits for bridged cases:

    - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
    to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
    RFC 4443 section 2.4 for ICMPv6. This function is already called by
    UDP tunnels

    - a new function generating those ICMP or ICMPv6 replies. We can't
    reuse icmp_send() and icmp6_send() as we don't see the sender as a
    valid destination. This doesn't need to be generic, as we don't
    cover any other type of ICMP errors given that we only provide an
    encapsulation function to the sender

    While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
    we might receive GSO buffers here, and the passed headroom already
    includes the inner MAC length, so we don't have to account for it
    a second time (that would imply three MAC headers on the wire, but
    there are just two).

    This issue became visible while bridging IPv6 packets with 4500 bytes
    of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
    bytes of encapsulation headroom, we would advertise MTU as 3950, and
    we would reject fragmented IPv6 datagrams of 3958 bytes size on the
    wire. We're exclusively dealing with network MTU here, though, so we
    could get Ethernet frames up to 3964 octets in that case.

    v2:
    - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
    - split IPv4/IPv6 functions (David Ahern)

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

02 Aug, 2020

2 commits

  • Resolved kernel/bpf/btf.c using instructions from merge commit
    69138b34a7248d2396ab85c8652e20c0c39beaba

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When vxlan interface is deleted, all fdbs are deleted by vxlan_flush().
    vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains
    all-zeros-mac because it is deleted by vxlan_uninit().
    But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac
    and default vni.
    So, the fdb, which contains both all-zeros-mac and non-default vni
    will not be deleted.

    Test commands:
    ip link add vxlan0 type vxlan dstport 4789 external
    ip link set vxlan0 up
    bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \
    src_vni 10000 self permanent
    ip link del vxlan0

    kmemleak reports as follows:
    unreferenced object 0xffff9486b25ced88 (size 96):
    comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s)
    hex dump (first 32 bytes):
    02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff ........@.......
    46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b F.............jk
    backtrace:
    [] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan]
    [] vxlan_fdb_create+0x184/0x1a0 [vxlan]
    [] vxlan_fdb_update+0x12f/0x220 [vxlan]
    [] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
    [] rtnl_fdb_add+0x187/0x270
    [] rtnetlink_rcv_msg+0x264/0x490
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x18e/0x250
    [] netlink_sendmsg+0x2e9/0x400
    [] ____sys_sendmsg+0x237/0x260
    [] ___sys_sendmsg+0x88/0xd0
    [] __sys_sendmsg+0x4e/0x80
    [] do_syscall_64+0x56/0xe0
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    unreferenced object 0xffff9486b1c40080 (size 128):
    comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff ..........B.....
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    backtrace:
    [] vxlan_fdb_create+0x67/0x1a0 [vxlan]
    [] vxlan_fdb_update+0x12f/0x220 [vxlan]
    [] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
    [] rtnl_fdb_add+0x187/0x270
    [] rtnetlink_rcv_msg+0x264/0x490
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x18e/0x250
    [] netlink_sendmsg+0x2e9/0x400
    [] ____sys_sendmsg+0x237/0x260
    [] ___sys_sendmsg+0x88/0xd0
    [] __sys_sendmsg+0x4e/0x80
    [] do_syscall_64+0x56/0xe0
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode")
    Signed-off-by: Taehee Yoo
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Taehee Yoo
     

30 Jul, 2020

1 commit

  • The commit cited below removed the RCU read-side critical section from
    rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
    without RCU protection.

    This results in the following warning [1] in the VXLAN driver, which
    relied on the callback being invoked from an RCU read-side critical
    section.

    Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
    in the bridge driver.

    [1]
    WARNING: suspicious RCU usage
    5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
    -----------------------------
    drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by bridge/166:
    #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090

    stack backtrace:
    CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
    Call Trace:
    dump_stack+0x100/0x184
    lockdep_rcu_suspicious+0x153/0x15d
    vxlan_fdb_dump+0x51e/0x6d0
    rtnl_fdb_dump+0x4dc/0xad0
    netlink_dump+0x540/0x1090
    __netlink_dump_start+0x695/0x950
    rtnetlink_rcv_msg+0x802/0xbd0
    netlink_rcv_skb+0x17a/0x480
    rtnetlink_rcv+0x22/0x30
    netlink_unicast+0x5ae/0x890
    netlink_sendmsg+0x98a/0xf40
    __sys_sendto+0x279/0x3b0
    __x64_sys_sendto+0xe6/0x1a0
    do_syscall_64+0x54/0xa0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fe14fa2ade0
    Code: Bad RIP value.
    RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
    RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
    RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

    Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

11 Jul, 2020

1 commit

  • Cater to devices which:
    (a) may want to sleep in the callbacks;
    (b) only have IPv4 support;
    (c) need all the programming to happen while the netdev is up.

    Drivers attach UDP tunnel offload info struct to their netdevs,
    where they declare how many UDP ports of various tunnel types
    they support. Core takes care of tracking which ports to offload.

    Use a fixed-size array since this matches what almost all drivers
    do, and avoids a complexity and uncertainty around memory allocations
    in an atomic context.

    Make sure that tunnel drivers don't try to replay the ports when
    new NIC netdev is registered. Automatic replays would mess up
    reference counting, and will be removed completely once all drivers
    are converted.

    v4:
    - use a #define NULL to avoid build issues with CONFIG_INET=n.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

26 Jun, 2020

1 commit


11 Jun, 2020

2 commits

  • vxlan driver should be using helpers to access nexthop struct
    internals. Remove open check if whether nexthop is multipath in
    favor of the existing nexthop_is_multipath helper. Add a new
    helper, nexthop_has_v4, to cover the need to check has_v4 in
    a group.

    Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
    Cc: Roopa Prabhu
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • fdb nexthops are marked with a flag. For standalone nexthops, a flag was
    added to the nh_info struct. For groups that flag was added to struct
    nexthop when it should have been added to the group information. Fix
    by removing the flag from the nexthop struct and adding a flag to nh_group
    that mirrors nh_info and is really only a caching of the individual types.
    Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the
    internal code to use the flag from either nh_info or nh_group.

    v2
    - propagate fdb_nh in remove_nh_grp_entry

    Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops")
    Cc: Roopa Prabhu
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

10 Jun, 2020

1 commit

  • The dynamic key update for addr_list_lock still causes troubles,
    for example the following race condition still exists:

    CPU 0: CPU 1:
    (RCU read lock) (RTNL lock)
    dev_mc_seq_show() netdev_update_lockdep_key()
    -> lockdep_unregister_key()
    -> netif_addr_lock_bh()

    because lockdep doesn't provide an API to update it atomically.
    Therefore, we have to move it back to static keys and use subclass
    for nest locking like before.

    In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key
    changes"), I already reverted most parts of commit ab92d68fc22f
    ("net: core: add generic lockdep keys").

    This patch reverts the rest and also part of commit f3b0a18bb6cb
    ("net: remove unnecessary variables and callback"). After this
    patch, addr_list_lock changes back to using static keys and
    subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
    not have to change back to ->ndo_get_lock_subclass().

    And hopefully this reduces some syzbot lockdep noises too.

    Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
    Cc: Taehee Yoo
    Cc: Dmitry Vyukov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

02 Jun, 2020

2 commits

  • fix dereference of nexthop group in fdb nexthop group
    update validation path.

    Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
    Reported-by: Ido Schimmel
    Suggested-by: Ido Schimmel
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     
  • When proxy mode is enabled the vxlan device might reply to Neighbor
    Solicitation (NS) messages on behalf of remote hosts.

    In case the NS message includes the "Source link-layer address" option
    [1], the vxlan device will use the specified address as the link-layer
    destination address in its reply.

    To avoid an infinite loop, break out of the options parsing loop when
    encountering an option with length zero and disregard the NS message.

    This is consistent with the IPv6 ndisc code and RFC 4886 which states
    that "Nodes MUST silently discard an ND packet that contains an option
    with length zero" [2].

    [1] https://tools.ietf.org/html/rfc4861#section-4.3
    [2] https://tools.ietf.org/html/rfc4861#section-4.6

    Fixes: 4b29dba9c085 ("vxlan: fix nonfunctional neigh_reduce()")
    Signed-off-by: Ido Schimmel
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Ido Schimmel
     

31 May, 2020

2 commits

  • - remove fdb from nh_list before the rcu grace period
    - protect fdb->vdev with rcu
    - hold spin lock before destroying fdb

    Fixes: c7cdbe2efc40 ("vxlan: support for nexthop notifiers")
    Signed-off-by: Roopa Prabhu
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Roopa Prabhu
     
  • NDA_NH_ID represents a remote ip or a group of remote ips.
    It allows use of nexthop groups in lieu of a remote ip or a
    list of remote ips supported by the fdb api.

    Current code ignores the other remote ip attrs when NDA_NH_ID is
    specified. In the spirit of strict checking, This commit adds a
    check to explicitly return an error on incorrect usage.

    Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

25 May, 2020

1 commit

  • vxlan_fdb_info() is not always called with RTNL held or from an RCU
    read-side critical section. For example, in the following call path:

    vxlan_cleanup()
    vxlan_fdb_destroy()
    vxlan_fdb_notify()
    __vxlan_fdb_notify()
    vxlan_fdb_info()

    The use of rtnl_dereference() can therefore result in the following
    splat [1].

    Fix this by dereferencing the nexthop under RCU read-side critical
    section.

    [1]
    [May24 22:56] =============================
    [ +0.004676] WARNING: suspicious RCU usage
    [ +0.004614] 5.7.0-rc5-custom-16219-g201392003491 #2772 Not tainted
    [ +0.007116] -----------------------------
    [ +0.004657] drivers/net/vxlan.c:276 suspicious rcu_dereference_check() usage!
    [ +0.008164]
    other info that might help us debug this:

    [ +0.009126]
    rcu_scheduler_active = 2, debug_locks = 1
    [ +0.007504] 5 locks held by bash/6892:
    [ +0.004392] #0: ffff8881d47e3410 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: __do_execve_file.isra.27+0x392/0x23c0
    [ +0.011795] #1: ffff8881d47e34b0 (&sig->exec_update_mutex){+.+.}-{3:3}, at: flush_old_exec+0x510/0x2030
    [ +0.010947] #2: ffff8881a141b0b0 (ptlock_ptr(page)#2){+.+.}-{2:2}, at: unmap_page_range+0x9c0/0x2590
    [ +0.010585] #3: ffff888230009d50 ((&vxlan->age_timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x800
    [ +0.010192] #4: ffff888183729bc8 (&vxlan->hash_lock[h]){+.-.}-{2:2}, at: vxlan_cleanup+0x133/0x4a0
    [ +0.010382]
    stack backtrace:
    [ +0.005103] CPU: 1 PID: 6892 Comm: bash Not tainted 5.7.0-rc5-custom-16219-g201392003491 #2772
    [ +0.009675] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
    [ +0.010155] Call Trace:
    [ +0.002775]
    [ +0.002313] dump_stack+0xfd/0x178
    [ +0.003895] lockdep_rcu_suspicious+0x14a/0x153
    [ +0.005157] vxlan_fdb_info+0xe39/0x12a0
    [ +0.004775] __vxlan_fdb_notify+0xb8/0x160
    [ +0.004672] vxlan_fdb_notify+0x8e/0xe0
    [ +0.004370] vxlan_fdb_destroy+0x117/0x330
    [ +0.004662] vxlan_cleanup+0x1aa/0x4a0
    [ +0.004329] call_timer_fn+0x1c4/0x800
    [ +0.004357] run_timer_softirq+0x129d/0x17e0
    [ +0.004762] __do_softirq+0x24c/0xaef
    [ +0.004232] irq_exit+0x167/0x190
    [ +0.003767] smp_apic_timer_interrupt+0x1dd/0x6a0
    [ +0.005340] apic_timer_interrupt+0xf/0x20
    [ +0.004620]

    Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
    Signed-off-by: Ido Schimmel
    Reported-by: Amit Cohen
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Ido Schimmel
     

23 May, 2020

2 commits

  • vxlan driver registers for nexthop add/del notifiers to
    cleanup fdb entries pointing to such nexthops.

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     
  • Todays vxlan mac fdb entries can point to multiple remote
    ips (rdsts) with the sole purpose of replicating
    broadcast-multicast and unknown unicast packets to those remote ips.

    E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
    load balanced to remote switches (vteps) belonging to the
    same multi-homed ethernet segment (E-VPN multihoming is analogous
    to multi-homed LAG implementations, but with the inter-switch
    peerlink replaced with a vxlan tunnel). In other words it needs
    support for mac ecmp. Furthermore, for faster convergence, E-VPN
    multihoming needs the ability to update fdb ecmp nexthops independent
    of the fdb entries.

    New route nexthop API is perfect for this usecase.
    This patch extends the vxlan fdb code to take a nexthop id
    pointing to an ecmp nexthop group.

    Changes include:
    - New NDA_NH_ID attribute for fdbs
    - Use the newly added fdb nexthop groups
    - makes vxlan rdsts and nexthop handling code mutually
    exclusive
    - since this is a new use-case and the requirement is for ecmp
    nexthop groups, the fdb add and update path checks that the
    nexthop is really an ecmp nexthop group. This check can be relaxed
    in the future, if we want to introduce replication fdb nexthop groups
    and allow its use in lieu of current rdst lists.
    - fdb update requests with nexthop id's only allowed for existing
    fdb's that have nexthop id's
    - learning will not override an existing fdb entry with nexthop
    group
    - I have wrapped the switchdev offload code around the presence of
    rdst

    [1] E-VPN RFC https://tools.ietf.org/html/rfc7432
    [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
    [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

    Includes a null check fix in vxlan_xmit from Nikolay

    v2 - Fixed build issue:
    Reported-by: kbuild test robot
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

24 Apr, 2020

1 commit

  • IFLA_VXLAN_* attributes are in the data array, which is correctly
    used when fetching the value, but not when setting the extended
    ack. Because IFLA_VXLAN_MAX < IFLA_MAX, we avoid out of bounds
    array accesses, but we don't provide a pointer to the invalid
    attribute to userspace.

    Fixes: 653ef6a3e4af ("vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reporting")
    Fixes: b4d3069783bc ("vxlan: Allow configuration of DF behaviour")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

19 Mar, 2020

1 commit


10 Jan, 2020

1 commit


03 Jan, 2020

2 commits

  • Before ip_tunnel_ecn_encap() and udp_tunnel_xmit_skb() we should filter
    tos value by RT_TOS() instead of using config tos directly.

    vxlan_get_route() would filter the tos to fl4.flowi4_tos but we didn't
    return it back, as geneve_get_v4_rt() did. So we have to use RT_TOS()
    directly in function ip_tunnel_ecn_encap().

    Fixes: 206aaafcd279 ("VXLAN: Use IP Tunnels tunnel ENC encap API")
    Fixes: 1400615d64cf ("vxlan: allow setting ipv6 traffic class")
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • Fixed Coding function and style issues

    Signed-off-by: Niu Xilei
    Signed-off-by: David S. Miller

    Niu Xilei
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

05 Dec, 2019

1 commit

  • ipv6_stub uses the ip6_dst_lookup function to allow other modules to
    perform IPv6 lookups. However, this function skips the XFRM layer
    entirely.

    All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
    ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
    which calls xfrm_lookup_route(). This patch fixes this inconsistent
    behavior by switching the stub to ip6_dst_lookup_flow, which also calls
    xfrm_lookup_route().

    This requires some changes in all the callers, as these two functions
    take different arguments and have different return types.

    Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

13 Nov, 2019

1 commit


03 Nov, 2019

1 commit


31 Oct, 2019

2 commits

  • This parameter has never been used.

    Signed-off-by: Guillaume Nault
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • After commit 0ce1822c2a08 ("vxlan: add adjacent link to limit depth
    level"), vxlan_changelink() could fail because of
    netdev_adjacent_change_prepare().
    netdev_adjacent_change_prepare() returns -EEXIST when old lower device
    and new lower device are same.
    (old lower device is "dst->remote_dev" and new lower device is "lowerdev")
    So, before calling it, lowerdev should be NULL if these devices are same.

    Test command1:
    ip link add dummy0 type dummy
    ip link add vxlan0 type vxlan dev dummy0 dstport 4789 vni 1
    ip link set vxlan0 type vxlan ttl 5
    RTNETLINK answers: File exists

    Reported-by: Dan Carpenter
    Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

30 Oct, 2019

1 commit

  • This patch is to improve the tun_info options_len by dropping
    the skb when TUNNEL_VXLAN_OPT is set but options_len is less
    than vxlan_metadata. This can void a potential out-of-bounds
    access on ip_tun_info.

    Fixes: ee122c79d422 ("vxlan: Flow based tunneling")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long