Eric Lee / smarc-fsl-linux-kernel

03 Dec, 2020

1 commit

832e09798 vxlan: fix error return code in __vxlan_dev_create() ... Browse Code »

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
Reported-by: Hulk Robot
Signed-off-by: Zhang Changzhong
Link: https://lore.kernel.org/r/1606903122-2098-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski

Zhang Changzhong
2020-12-03 10:04:02 +0800

01 Dec, 2020

2 commits

a5e74021e vxlan: Copy needed_tailroom from lowerdev ... Browse Code »

While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
that case, copy it over to reduce the chance for additional (re)allocations
in the transmit path.

Signed-off-by: Sven Eckelmann
Link: https://lore.kernel.org/r/20201126125247.1047977-2-sven@narfation.org
Signed-off-by: Jakub Kicinski

Sven Eckelmann
2020-12-01 10:10:12 +0800
0a35dc41f vxlan: Add needed_headroom for lower device ... Browse Code »

It was observed that sending data via batadv over vxlan (on top of
wireguard) reduced the performance massively compared to raw ethernet or
batadv on raw ethernet. A check of perf data showed that the
vxlan_build_skb was calling all the time pskb_expand_head to allocate
enough headroom for:

min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+ VXLAN_HLEN + iphdr_len;

But the vxlan_config_apply only requested needed headroom for:

lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM

So it completely ignored the needed_headroom of the lower device. The first
caller of net_dev_xmit could therefore never make sure that enough headroom
was allocated for the rest of the transmit path.

Cc: Annika Wickert
Signed-off-by: Sven Eckelmann
Tested-by: Annika Wickert
Link: https://lore.kernel.org/r/20201126125247.1047977-1-sven@narfation.org
Signed-off-by: Jakub Kicinski

Sven Eckelmann
2020-12-01 10:10:12 +0800

06 Oct, 2020

1 commit

1f8dda1d2 vxlan: use dev_sw_netstats_rx_add() ... Browse Code »

use new helper for netstats settings

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2020-10-06 21:23:21 +0800

27 Sep, 2020

1 commit

435be28b0 Revert "vxlan: move encapsulation warning" ... Browse Code »

This reverts commit 546c044c9651e81a16833806feff6b369bb5de33.

Nothing prevents user from sending frames to "external" VxLAN devices.
In fact kernel itself may generate icmp chatter.

This is fine, such frames should be dropped.

The point of the "missing encapsulation" warning was that
frames with missing encap should not make it into vxlan_xmit_one().
And vxlan_xmit() drops them cleanly, so let it just do that.

Without this revert the warning is triggered by the udp_tunnel_nic.sh
test, but the minimal repro is:

$ ip link add vxlan0 type vxlan \
group 239.1.1.1 \
dev lo \
dstport 1234 \
external
$ ip li set dev vxlan0 up

[ 419.165981] vxlan0: Missing encapsulation instructions
[ 419.166551] WARNING: CPU: 0 PID: 1041 at drivers/net/vxlan.c:2889 vxlan_xmit+0x15c0/0x1fc0 [vxlan]

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2020-09-27 03:34:47 +0800

26 Sep, 2020

5 commits

78ec710e7 vxlan: fix vxlan_find_sock() documentation for l3mdev ... Browse Code »

Since commit aab8cc3630e32
("vxlan: add support for underlay in non-default VRF")

vxlan_find_sock() also checks if socket is assigned to the right
level 3 master device when lower device is not in the default VRF.

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2020-09-26 07:58:07 +0800
2eabcb8af vxlan: check rtnl_configure_link return code correctly ... Browse Code »

rtnl_configure_link is always checked if < 0 for error code.

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2020-09-26 07:58:07 +0800
546c044c9 vxlan: move encapsulation warning ... Browse Code »

vxlan_xmit_one() was only called from vxlan_xmit() without rdst and
info was already tested. Emit warning in that function instead

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2020-09-26 07:58:07 +0800
0189399cb vxlan: add unlikely to vxlan_remcsum check ... Browse Code »

small optimization around checking as it's being done in all
receptions

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2020-09-26 07:58:07 +0800
2ae2904b5 vxlan: don't collect metadata if remote checksum is wrong ... Browse Code »

call vxlan_remcsum() before md filling in vxlan_rcv()

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2020-09-26 07:58:07 +0800

06 Aug, 2020

1 commit

a0dced17a Revert "vxlan: fix tos value before xmit" ... Browse Code »

This reverts commit 71130f29979c7c7956b040673e6b9d5643003176.

In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to
make sure the tos value are filtered by RT_TOS() based on RFC1349.

0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | MBZ |
+-----+-----+-----+-----+-----+-----+-----+-----+

But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like

0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+

So with

IPTOS_TOS_MASK 0x1E
RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)

the first 3 bits DSCP info will get lost.

To take all the DSCP info in xmit, we should revert the patch and just push
all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later.

Fixes: 71130f29979c ("vxlan: fix tos value before xmit")
Signed-off-by: Hangbin Liu
Acked-by: Guillaume Nault
Signed-off-by: David S. Miller

Hangbin Liu
2020-08-06 03:09:10 +0800

05 Aug, 2020

2 commits

fc68c9957 vxlan: Support for PMTU discovery on directly bridged links ... Browse Code »

If the interface is a bridge or Open vSwitch port, and we can't
forward a packet because it exceeds the local PMTU estimate,
trigger an ICMP or ICMPv6 reply to the sender, using the same
interface to forward it back.

If metadata collection is enabled, reverse destination and source
addresses, so that Open vSwitch is able to match this packet against
the existing, reverse flow.

v2: Use netif_is_any_bridge_port() (David Ahern)

Signed-off-by: Stefano Brivio
Signed-off-by: David S. Miller

Stefano Brivio
2020-08-05 04:01:45 +0800
4cb47a864 tunnels: PMTU discovery support for directly bridged IP packets ... Browse Code »

It's currently possible to bridge Ethernet tunnels carrying IP
packets directly to external interfaces without assigning them
addresses and routes on the bridged network itself: this is the case
for UDP tunnels bridged with a standard bridge or by Open vSwitch.

PMTU discovery is currently broken with those configurations, because
the encapsulation effectively decreases the MTU of the link, and
while we are able to account for this using PMTU discovery on the
lower layer, we don't have a way to relay ICMP or ICMPv6 messages
needed by the sender, because we don't have valid routes to it.

On the other hand, as a tunnel endpoint, we can't fragment packets
as a general approach: this is for instance clearly forbidden for
VXLAN by RFC 7348, section 4.3:

VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may
fragment encapsulated VXLAN packets due to the larger frame size.
The destination VTEP MAY silently discard such VXLAN fragments.

The same paragraph recommends that the MTU over the physical network
accomodates for encapsulations, but this isn't a practical option for
complex topologies, especially for typical Open vSwitch use cases.

Further, it states that:

Other techniques like Path MTU discovery (see [RFC1191] and
[RFC1981]) MAY be used to address this requirement as well.

Now, PMTU discovery already works for routed interfaces, we get
route exceptions created by the encapsulation device as they receive
ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
we already rebuild those messages with the appropriate MTU and route
them back to the sender.

Add the missing bits for bridged cases:

- checks in skb_tunnel_check_pmtu() to understand if it's appropriate
to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
RFC 4443 section 2.4 for ICMPv6. This function is already called by
UDP tunnels

- a new function generating those ICMP or ICMPv6 replies. We can't
reuse icmp_send() and icmp6_send() as we don't see the sender as a
valid destination. This doesn't need to be generic, as we don't
cover any other type of ICMP errors given that we only provide an
encapsulation function to the sender

While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
we might receive GSO buffers here, and the passed headroom already
includes the inner MAC length, so we don't have to account for it
a second time (that would imply three MAC headers on the wire, but
there are just two).

This issue became visible while bridging IPv6 packets with 4500 bytes
of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
bytes of encapsulation headroom, we would advertise MTU as 3950, and
we would reject fragmented IPv6 datagrams of 3958 bytes size on the
wire. We're exclusively dealing with network MTU here, though, so we
could get Ethernet frames up to 3964 octets in that case.

v2:
- moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
- split IPv4/IPv6 functions (David Ahern)

Signed-off-by: Stefano Brivio
Reviewed-by: David Ahern
Signed-off-by: David S. Miller

Stefano Brivio
2020-08-05 04:01:45 +0800

02 Aug, 2020

2 commits

bd0b33b24 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Resolved kernel/bpf/btf.c using instructions from merge commit
69138b34a7248d2396ab85c8652e20c0c39beaba

Signed-off-by: David S. Miller

David S. Miller
2020-08-02 16:02:12 +0800
fda2ec62c vxlan: fix memleak of fdb ... Browse Code »

When vxlan interface is deleted, all fdbs are deleted by vxlan_flush().
vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains
all-zeros-mac because it is deleted by vxlan_uninit().
But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac
and default vni.
So, the fdb, which contains both all-zeros-mac and non-default vni
will not be deleted.

Test commands:
ip link add vxlan0 type vxlan dstport 4789 external
ip link set vxlan0 up
bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \
src_vni 10000 self permanent
ip link del vxlan0

kmemleak reports as follows:
unreferenced object 0xffff9486b25ced88 (size 96):
comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s)
hex dump (first 32 bytes):
02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff ........@.......
46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b F.............jk
backtrace:
[] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan]
[] vxlan_fdb_create+0x184/0x1a0 [vxlan]
[] vxlan_fdb_update+0x12f/0x220 [vxlan]
[] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
[] rtnl_fdb_add+0x187/0x270
[] rtnetlink_rcv_msg+0x264/0x490
[] netlink_rcv_skb+0x4a/0x110
[] netlink_unicast+0x18e/0x250
[] netlink_sendmsg+0x2e9/0x400
[] ____sys_sendmsg+0x237/0x260
[] ___sys_sendmsg+0x88/0xd0
[] __sys_sendmsg+0x4e/0x80
[] do_syscall_64+0x56/0xe0
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff9486b1c40080 (size 128):
comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff ..........B.....
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
backtrace:
[] vxlan_fdb_create+0x67/0x1a0 [vxlan]
[] vxlan_fdb_update+0x12f/0x220 [vxlan]
[] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
[] rtnl_fdb_add+0x187/0x270
[] rtnetlink_rcv_msg+0x264/0x490
[] netlink_rcv_skb+0x4a/0x110
[] netlink_unicast+0x18e/0x250
[] netlink_sendmsg+0x2e9/0x400
[] ____sys_sendmsg+0x237/0x260
[] ___sys_sendmsg+0x88/0xd0
[] __sys_sendmsg+0x4e/0x80
[] do_syscall_64+0x56/0xe0
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode")
Signed-off-by: Taehee Yoo
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Taehee Yoo
2020-08-02 02:49:18 +0800

30 Jul, 2020

1 commit

b5141915b vxlan: Ensure FDB dump is performed under RCU ... Browse Code »

The commit cited below removed the RCU read-side critical section from
rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
without RCU protection.

This results in the following warning [1] in the VXLAN driver, which
relied on the callback being invoked from an RCU read-side critical
section.

Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
in the bridge driver.

[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
-----------------------------
drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by bridge/166:
#0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090

stack backtrace:
CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x100/0x184
lockdep_rcu_suspicious+0x153/0x15d
vxlan_fdb_dump+0x51e/0x6d0
rtnl_fdb_dump+0x4dc/0xad0
netlink_dump+0x540/0x1090
__netlink_dump_start+0x695/0x950
rtnetlink_rcv_msg+0x802/0xbd0
netlink_rcv_skb+0x17a/0x480
rtnetlink_rcv+0x22/0x30
netlink_unicast+0x5ae/0x890
netlink_sendmsg+0x98a/0xf40
__sys_sendto+0x279/0x3b0
__x64_sys_sendto+0xe6/0x1a0
do_syscall_64+0x54/0xa0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe14fa2ade0
Code: Bad RIP value.
RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
Signed-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2020-07-30 03:04:54 +0800

11 Jul, 2020

1 commit

cc4e3835e udp_tunnel: add central NIC RX port offload infrastructure ... Browse Code »

Cater to devices which:
(a) may want to sleep in the callbacks;
(b) only have IPv4 support;
(c) need all the programming to happen while the netdev is up.

Drivers attach UDP tunnel offload info struct to their netdevs,
where they declare how many UDP ports of various tunnel types
they support. Core takes care of tracking which ports to offload.

Use a fixed-size array since this matches what almost all drivers
do, and avoids a complexity and uncertainty around memory allocations
in an atomic context.

Make sure that tunnel drivers don't try to replay the ports when
new NIC netdev is registered. Automatic replays would mess up
reference counting, and will be removed completely once all drivers
are converted.

v4:
- use a #define NULL to avoid build issues with CONFIG_INET=n.

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2020-07-11 04:54:00 +0800

26 Jun, 2020

1 commit

b18e9834f vxlan: fix last fdb index during dump of fdb with nhid ... Browse Code »

This patch fixes last saved fdb index in fdb dump handler when
handling fdb's with nhid.

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

Roopa Prabhu
2020-06-26 07:12:34 +0800

11 Jun, 2020

2 commits

50cb8769f vxlan: Remove access to nexthop group struct ... Browse Code »

vxlan driver should be using helpers to access nexthop struct
internals. Remove open check if whether nexthop is multipath in
favor of the existing nexthop_is_multipath helper. Add a new
helper, nexthop_has_v4, to cover the need to check has_v4 in
a group.

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Cc: Roopa Prabhu
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2020-06-11 04:20:20 +0800
ce9ac056d nexthop: Fix fdb labeling for groups ... Browse Code »

fdb nexthops are marked with a flag. For standalone nexthops, a flag was
added to the nh_info struct. For groups that flag was added to struct
nexthop when it should have been added to the group information. Fix
by removing the flag from the nexthop struct and adding a flag to nh_group
that mirrors nh_info and is really only a caching of the individual types.
Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the
internal code to use the flag from either nh_info or nh_group.

v2
- propagate fdb_nh in remove_nh_grp_entry

Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops")
Cc: Roopa Prabhu
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2020-06-11 04:18:40 +0800

10 Jun, 2020

1 commit

845e0ebb4 net: change addr_list_lock back to static key ... Browse Code »

The dynamic key update for addr_list_lock still causes troubles,
for example the following race condition still exists:

CPU 0: CPU 1:
(RCU read lock) (RTNL lock)
dev_mc_seq_show() netdev_update_lockdep_key()
-> lockdep_unregister_key()
-> netif_addr_lock_bh()

because lockdep doesn't provide an API to update it atomically.
Therefore, we have to move it back to static keys and use subclass
for nest locking like before.

In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key
changes"), I already reverted most parts of commit ab92d68fc22f
("net: core: add generic lockdep keys").

This patch reverts the rest and also part of commit f3b0a18bb6cb
("net: remove unnecessary variables and callback"). After this
patch, addr_list_lock changes back to using static keys and
subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
not have to change back to ->ndo_get_lock_subclass().

And hopefully this reduces some syzbot lockdep noises too.

Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
Cc: Taehee Yoo
Cc: Dmitry Vyukov
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2020-06-10 03:59:45 +0800

02 Jun, 2020

2 commits

03eaeda78 vxlan: fix dereference of nexthop group in nexthop update path ... Browse Code »

fix dereference of nexthop group in fdb nexthop group
update validation path.

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Reported-by: Ido Schimmel
Suggested-by: Ido Schimmel
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

Roopa Prabhu
2020-06-02 03:06:26 +0800
8066e6b44 vxlan: Avoid infinite loop when suppressing NS messages with invalid options ... Browse Code »

When proxy mode is enabled the vxlan device might reply to Neighbor
Solicitation (NS) messages on behalf of remote hosts.

In case the NS message includes the "Source link-layer address" option
[1], the vxlan device will use the specified address as the link-layer
destination address in its reply.

To avoid an infinite loop, break out of the options parsing loop when
encountering an option with length zero and disregard the NS message.

This is consistent with the IPv6 ndisc code and RFC 4886 which states
that "Nodes MUST silently discard an ND packet that contains an option
with length zero" [2].

[1] https://tools.ietf.org/html/rfc4861#section-4.3
[2] https://tools.ietf.org/html/rfc4861#section-4.6

Fixes: 4b29dba9c085 ("vxlan: fix nonfunctional neigh_reduce()")
Signed-off-by: Ido Schimmel
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Ido Schimmel
2020-06-02 02:08:41 +0800

31 May, 2020

2 commits

79472fe87 vxlan: few locking fixes in nexthop event handler ... Browse Code »

- remove fdb from nh_list before the rcu grace period
- protect fdb->vdev with rcu
- hold spin lock before destroying fdb

Fixes: c7cdbe2efc40 ("vxlan: support for nexthop notifiers")
Signed-off-by: Roopa Prabhu
Reviewed-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Roopa Prabhu
2020-05-31 12:47:08 +0800
72b486821 vxlan: add check to prevent use of remote ip attributes with NDA_NH_ID ... Browse Code »

NDA_NH_ID represents a remote ip or a group of remote ips.
It allows use of nexthop groups in lieu of a remote ip or a
list of remote ips supported by the fdb api.

Current code ignores the other remote ip attrs when NDA_NH_ID is
specified. In the spirit of strict checking, This commit adds a
check to explicitly return an error on incorrect usage.

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

Roopa Prabhu
2020-05-31 12:47:08 +0800

25 May, 2020

1 commit

06ec313ee vxlan: Do not assume RTNL is held in vxlan_fdb_info() ... Browse Code »

vxlan_fdb_info() is not always called with RTNL held or from an RCU
read-side critical section. For example, in the following call path:

vxlan_cleanup()
vxlan_fdb_destroy()
vxlan_fdb_notify()
__vxlan_fdb_notify()
vxlan_fdb_info()

The use of rtnl_dereference() can therefore result in the following
splat [1].

Fix this by dereferencing the nexthop under RCU read-side critical
section.

[1]
[May24 22:56] =============================
[ +0.004676] WARNING: suspicious RCU usage
[ +0.004614] 5.7.0-rc5-custom-16219-g201392003491 #2772 Not tainted
[ +0.007116] -----------------------------
[ +0.004657] drivers/net/vxlan.c:276 suspicious rcu_dereference_check() usage!
[ +0.008164]
other info that might help us debug this:

[ +0.009126]
rcu_scheduler_active = 2, debug_locks = 1
[ +0.007504] 5 locks held by bash/6892:
[ +0.004392] #0: ffff8881d47e3410 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: __do_execve_file.isra.27+0x392/0x23c0
[ +0.011795] #1: ffff8881d47e34b0 (&sig->exec_update_mutex){+.+.}-{3:3}, at: flush_old_exec+0x510/0x2030
[ +0.010947] #2: ffff8881a141b0b0 (ptlock_ptr(page)#2){+.+.}-{2:2}, at: unmap_page_range+0x9c0/0x2590
[ +0.010585] #3: ffff888230009d50 ((&vxlan->age_timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x800
[ +0.010192] #4: ffff888183729bc8 (&vxlan->hash_lock[h]){+.-.}-{2:2}, at: vxlan_cleanup+0x133/0x4a0
[ +0.010382]
stack backtrace:
[ +0.005103] CPU: 1 PID: 6892 Comm: bash Not tainted 5.7.0-rc5-custom-16219-g201392003491 #2772
[ +0.009675] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ +0.010155] Call Trace:
[ +0.002775]
[ +0.002313] dump_stack+0xfd/0x178
[ +0.003895] lockdep_rcu_suspicious+0x14a/0x153
[ +0.005157] vxlan_fdb_info+0xe39/0x12a0
[ +0.004775] __vxlan_fdb_notify+0xb8/0x160
[ +0.004672] vxlan_fdb_notify+0x8e/0xe0
[ +0.004370] vxlan_fdb_destroy+0x117/0x330
[ +0.004662] vxlan_cleanup+0x1aa/0x4a0
[ +0.004329] call_timer_fn+0x1c4/0x800
[ +0.004357] run_timer_softirq+0x129d/0x17e0
[ +0.004762] __do_softirq+0x24c/0xaef
[ +0.004232] irq_exit+0x167/0x190
[ +0.003767] smp_apic_timer_interrupt+0x1dd/0x6a0
[ +0.005340] apic_timer_interrupt+0xf/0x20
[ +0.004620]

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Ido Schimmel
Reported-by: Amit Cohen
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Ido Schimmel
2020-05-25 10:34:11 +0800

23 May, 2020

2 commits

c7cdbe2ef vxlan: support for nexthop notifiers ... Browse Code »

vxlan driver registers for nexthop add/del notifiers to
cleanup fdb entries pointing to such nexthops.

Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

Roopa Prabhu
2020-05-23 05:00:38 +0800
1274e1cc4 vxlan: ecmp support for mac fdb entries ... Browse Code »

Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

Roopa Prabhu
2020-05-23 05:00:38 +0800

24 Apr, 2020

1 commit

cc8e7c69d vxlan: use the correct nlattr array in NL_SET_ERR_MSG_ATTR ... Browse Code »

IFLA_VXLAN_* attributes are in the data array, which is correctly
used when fetching the value, but not when setting the extended
ack. Because IFLA_VXLAN_MAX < IFLA_MAX, we avoid out of bounds
array accesses, but we don't provide a pointer to the invalid
attribute to userspace.

Fixes: 653ef6a3e4af ("vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reporting")
Fixes: b4d3069783bc ("vxlan: Allow configuration of DF behaviour")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2020-04-24 03:39:09 +0800

19 Mar, 2020

1 commit

384d91c26 vxlan: check return value of gro_cells_init() ... Browse Code »

gro_cells_init() returns error if memory allocation is failed.
But the vxlan module doesn't check the return value of gro_cells_init().

Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer")`
Signed-off-by: Taehee Yoo
Signed-off-by: David S. Miller

Taehee Yoo
2020-03-19 07:43:12 +0800

10 Jan, 2020

1 commit

a2d6d7ae5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

The ungrafting from PRIO bug fixes in net, when merged into net-next,
merge cleanly but create a build failure. The resolution used here is
from Petr Machata.

Signed-off-by: David S. Miller

David S. Miller
2020-01-10 04:13:43 +0800

03 Jan, 2020

2 commits

71130f299 vxlan: fix tos value before xmit ... Browse Code »

Before ip_tunnel_ecn_encap() and udp_tunnel_xmit_skb() we should filter
tos value by RT_TOS() instead of using config tos directly.

vxlan_get_route() would filter the tos to fl4.flowi4_tos but we didn't
return it back, as geneve_get_v4_rt() did. So we have to use RT_TOS()
directly in function ip_tunnel_ecn_encap().

Fixes: 206aaafcd279 ("VXLAN: Use IP Tunnels tunnel ENC encap API")
Fixes: 1400615d64cf ("vxlan: allow setting ipv6 traffic class")
Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller

Hangbin Liu
2020-01-03 08:35:48 +0800
98c814764 vxlan: Fix alignment and code style of vxlan.c ... Browse Code »

Fixed Coding function and style issues

Signed-off-by: Niu Xilei
Signed-off-by: David S. Miller

Niu Xilei
2020-01-03 07:41:33 +0800

10 Dec, 2019

1 commit

c593642c8 treewide: Use sizeof_field() macro ... Browse Code »

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Acked-by: David Miller # for net

Pankaj Bharadiya
2019-12-10 02:36:44 +0800

05 Dec, 2019

1 commit

6c8991f41 net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup ... Browse Code »

ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.

All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().

This requires some changes in all the callers, as these two functions
take different arguments and have different return types.

Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2019-12-05 04:27:13 +0800

13 Nov, 2019

1 commit

36fe3a61a vxlan: implement get_link_ksettings ethtool method ... Browse Code »

Similar to VLAN and similar drivers, we can forward get_link_ksettings to
the lower dev if we have one to get meaningful speed/duplex data.

Signed-off-by: Matthias Schiffer
Signed-off-by: David S. Miller

Matthias Schiffer
2019-11-13 11:52:15 +0800

03 Nov, 2019

1 commit

d31e95585 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.

The rest were (relatively) trivial in nature.

Signed-off-by: David S. Miller

David S. Miller
2019-11-03 04:54:56 +0800

31 Oct, 2019

2 commits

1d7a55267 vxlan: drop "vxlan" parameter in vxlan_fdb_alloc() ... Browse Code »

This parameter has never been used.

Signed-off-by: Guillaume Nault
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller

Guillaume Nault
2019-10-31 08:41:50 +0800
c6761cf52 vxlan: fix unexpected failure of vxlan_changelink() ... Browse Code »

After commit 0ce1822c2a08 ("vxlan: add adjacent link to limit depth
level"), vxlan_changelink() could fail because of
netdev_adjacent_change_prepare().
netdev_adjacent_change_prepare() returns -EEXIST when old lower device
and new lower device are same.
(old lower device is "dst->remote_dev" and new lower device is "lowerdev")
So, before calling it, lowerdev should be NULL if these devices are same.

Test command1:
ip link add dummy0 type dummy
ip link add vxlan0 type vxlan dev dummy0 dstport 4789 vni 1
ip link set vxlan0 type vxlan ttl 5
RTNETLINK answers: File exists

Reported-by: Dan Carpenter
Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
Signed-off-by: Taehee Yoo
Signed-off-by: David S. Miller

Taehee Yoo
2019-10-31 02:52:47 +0800

30 Oct, 2019

1 commit

eadf52cf1 vxlan: check tun_info options_len properly ... Browse Code »

This patch is to improve the tun_info options_len by dropping
the skb when TUNNEL_VXLAN_OPT is set but options_len is less
than vxlan_metadata. This can void a potential out-of-bounds
access on ip_tun_info.

Fixes: ee122c79d422 ("vxlan: Flow based tunneling")
Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2019-10-30 08:39:26 +0800