17 Jan, 2021
1 commit
-
[ Upstream commit 50c661670f6a3908c273503dfa206dfc7aa54c07 ]
For some reason ip_tunnel insist on setting the DF bit anyway when the
inner header has the DF bit set, EVEN if the tunnel was configured with
'nopmtudisc'.This means that the script added in the previous commit
cannot be made to work by adding the 'nopmtudisc' flag to the
ip tunnel configuration. Doing so breaks connectivity even for the
without-conntrack/netfilter scenario.When nopmtudisc is set, the tunnel will skip the mtu check, so no
icmp error is sent to client. Then, because inner header has DF set,
the outer header gets added with DF bit set as well.IP stack then sends an error to itself because the packet exceeds
the device MTU.Fixes: 23a3647bc4f93 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Stefano Brivio
Signed-off-by: Florian Westphal
Acked-by: Pablo Neira Ayuso
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman
01 Nov, 2020
1 commit
-
The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT.
And this was also the behavior for gre device before switching to use
ip_md_tunnel_xmit in commit 962924fa2b7a ("ip_gre: Refactor collect
metatdata mode tunnel xmit to ip_md_tunnel_xmit")When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
make the discrepancy between handling of DF by different tunnels. So in the
ip_md_tunnel_xmit should follow the same rule like other tunnels.Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu
Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski
06 Oct, 2020
1 commit
-
use new helper for netstats settings
Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller
19 Jun, 2020
1 commit
-
In the datapath, the ip_tunnel_lookup() is used and it internally uses
fallback tunnel device pointer, which is fb_tunnel_dev.
This pointer variable should be set to NULL when a fb interface is deleted.
But there is no routine to set fb_tunnel_dev pointer to NULL.
So, this pointer will be still used after interface is deleted and
it eventually results in the use-after-free problem.Test commands:
ip netns add A
ip netns add B
ip link add eth0 type veth peer name eth1
ip link set eth0 netns A
ip link set eth1 netns Bip netns exec A ip link set lo up
ip netns exec A ip link set eth0 up
ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
remote 10.0.0.2
ip netns exec A ip link set gre1 up
ip netns exec A ip a a 10.0.100.1/24 dev gre1
ip netns exec A ip a a 10.0.0.1/24 dev eth0ip netns exec B ip link set lo up
ip netns exec B ip link set eth1 up
ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
remote 10.0.0.1
ip netns exec B ip link set gre1 up
ip netns exec B ip a a 10.0.100.2/24 dev gre1
ip netns exec B ip a a 10.0.0.2/24 dev eth1
ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
ip netns del BSplat looks like:
[ 77.793450][ C3] ==================================================================
[ 77.794702][ C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
[ 77.795573][ C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
[ 77.796398][ C3]
[ 77.796664][ C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
[ 77.797474][ C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 77.798453][ C3] Call Trace:
[ 77.798815][ C3]
[ 77.799142][ C3] dump_stack+0x9d/0xdb
[ 77.799605][ C3] print_address_description.constprop.7+0x2cc/0x450
[ 77.800365][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
[ 77.800908][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
[ 77.801517][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
[ 77.802145][ C3] kasan_report+0x154/0x190
[ 77.802821][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
[ 77.803503][ C3] ip_tunnel_lookup+0xcc4/0xf30
[ 77.804165][ C3] __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
[ 77.804862][ C3] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 77.805621][ C3] gre_rcv+0x304/0x1910 [ip_gre]
[ 77.806293][ C3] ? lock_acquire+0x1a9/0x870
[ 77.806925][ C3] ? gre_rcv+0xfe/0x354 [gre]
[ 77.807559][ C3] ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
[ 77.808305][ C3] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 77.809032][ C3] ? rcu_read_lock_held+0x90/0xa0
[ 77.809713][ C3] gre_rcv+0x1b8/0x354 [gre]
[ ... ]Suggested-by: Eric Dumazet
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Taehee Yoo
Signed-off-by: David S. Miller
20 May, 2020
1 commit
-
This method is used to properly allow kernel callers of the IPv4 route
management ioctls. The exsting ip_tunnel_ioctl helper is renamed to
ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls
touching user memory, and is used for the guts of ndo_tunnel_ctl
implementations. A new ip_tunnel_ioctl helper is added that can be wired
up directly to the ndo_do_ioctl method and takes care of the copy to and
from userspace.Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller
30 Mar, 2020
1 commit
-
when creating a new ipip interface with no local/remote configuration,
the lookup is done with TUNNEL_NO_KEY flag, making it impossible to
match the new interface (only possible match being fallback or metada
case interface); e.g: `ip link add tunl1 type ipip dev eth0`To fix this case, adding a flag check before the key comparison so we
permit to match an interface with no local/remote config; it also avoids
breaking possible userland tools relying on TUNNEL_NO_KEY flag and
uninitialised key.context being on my side, I'm creating an extra ipip interface attached
to the physical one, and moving it to a dedicated namespace.Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: William Dauchy
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
21 Jan, 2020
1 commit
-
in the same manner as commit 690afc165bb3 ("net: ip6_gre: fix moving
ip6gre between namespaces"), fix namespace moving as it was broken since
commit 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.").
Indeed, the ip6_gre commit removed the local flag for collect_md
condition, so there is no reason to keep it for ip_gre/ip_tunnel.this patch will fix both ip_tunnel and ip_gre modules.
Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: William Dauchy
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller
25 Dec, 2019
1 commit
-
When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.v5: No Change.
v4: Update commit description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reviewed-by: Guillaume Nault
Tested-by: Guillaume Nault
Acked-by: David Ahern
Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller
05 Jun, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usaextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 21 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Reviewed-by: Richard Fontana
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.de
Signed-off-by: Greg Kroah-Hartman
07 Mar, 2019
1 commit
-
Naresh Kamboju noted the following oops during execution of selftest
tools/testing/selftests/bpf/test_tunnel.sh on x86_64:[ 274.120445] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000000
[ 274.128285] #PF error: [INSTR]
[ 274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0
[ 274.138241] Oops: 0010 [#1] SMP PTI
[ 274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted
5.0.0-rc4-next-20190129 #1
[ 274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 274.156526] RIP: 0010: (null)
[ 274.160280] Code: Bad RIP value.
[ 274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
[ 274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
[ 274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
[ 274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
[ 274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
[ 274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
[ 274.204346] FS: 00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
knlGS:0000000000000000
[ 274.212424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
[ 274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 274.239541] Call Trace:
[ 274.241988] ? tnl_update_pmtu+0x296/0x3b0
[ 274.246085] ip_md_tunnel_xmit+0x1bc/0x520
[ 274.250176] gre_fb_xmit+0x330/0x390
[ 274.253754] gre_tap_xmit+0x128/0x180
[ 274.257414] dev_hard_start_xmit+0xb7/0x300
[ 274.261598] sch_direct_xmit+0xf6/0x290
[ 274.265430] __qdisc_run+0x15d/0x5e0
[ 274.269007] __dev_queue_xmit+0x2c5/0xc00
[ 274.273011] ? dev_queue_xmit+0x10/0x20
[ 274.276842] ? eth_header+0x2b/0xc0
[ 274.280326] dev_queue_xmit+0x10/0x20
[ 274.283984] ? dev_queue_xmit+0x10/0x20
[ 274.287813] arp_xmit+0x1a/0xf0
[ 274.290952] arp_send_dst.part.19+0x46/0x60
[ 274.295138] arp_solicit+0x177/0x6b0
[ 274.298708] ? mod_timer+0x18e/0x440
[ 274.302281] neigh_probe+0x57/0x70
[ 274.305684] __neigh_event_send+0x197/0x2d0
[ 274.309862] neigh_resolve_output+0x18c/0x210
[ 274.314212] ip_finish_output2+0x257/0x690
[ 274.318304] ip_finish_output+0x219/0x340
[ 274.322314] ? ip_finish_output+0x219/0x340
[ 274.326493] ip_output+0x76/0x240
[ 274.329805] ? ip_fragment.constprop.53+0x80/0x80
[ 274.334510] ip_local_out+0x3f/0x70
[ 274.337992] ip_send_skb+0x19/0x40
[ 274.341391] ip_push_pending_frames+0x33/0x40
[ 274.345740] raw_sendmsg+0xc15/0x11d0
[ 274.349403] ? __might_fault+0x85/0x90
[ 274.353151] ? _copy_from_user+0x6b/0xa0
[ 274.357070] ? rw_copy_check_uvector+0x54/0x130
[ 274.361604] inet_sendmsg+0x42/0x1c0
[ 274.365179] ? inet_sendmsg+0x42/0x1c0
[ 274.368937] sock_sendmsg+0x3e/0x50
[ 274.372460] ___sys_sendmsg+0x26f/0x2d0
[ 274.376293] ? lock_acquire+0x95/0x190
[ 274.380043] ? __handle_mm_fault+0x7ce/0xb70
[ 274.384307] ? lock_acquire+0x95/0x190
[ 274.388053] ? __audit_syscall_entry+0xdd/0x130
[ 274.392586] ? ktime_get_coarse_real_ts64+0x64/0xc0
[ 274.397461] ? __audit_syscall_entry+0xdd/0x130
[ 274.401989] ? trace_hardirqs_on+0x4c/0x100
[ 274.406173] __sys_sendmsg+0x63/0xa0
[ 274.409744] ? __sys_sendmsg+0x63/0xa0
[ 274.413488] __x64_sys_sendmsg+0x1f/0x30
[ 274.417405] do_syscall_64+0x55/0x190
[ 274.421064] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 274.426113] RIP: 0033:0x7ff4ae0e6e87
[ 274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00
00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00
00 0f 05 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c
24 08
[ 274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87
[ 274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003
[ 274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080
[ 274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003
[ 274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4
[ 274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter
ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded:
test_bpf]
[ 274.504634] CR2: 0000000000000000
[ 274.507976] ---[ end trace 196d18386545eae1 ]---
[ 274.512588] RIP: 0010: (null)
[ 274.516334] Code: Bad RIP value.
[ 274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
[ 274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
[ 274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
[ 274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
[ 274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
[ 274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
[ 274.560456] FS: 00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
knlGS:0000000000000000
[ 274.568541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
[ 274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 274.595658] Kernel panic - not syncing: Fatal exception in interrupt
[ 274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 274.612827] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---
[ 274.620387] ------------[ cut here ]------------I'm also seeing the same failure on x86_64, and it reproduces
consistently.>From poking around it looks like the skb's dst entry is being used
to calculate the mtu in:mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
...but because that dst_entry has an "ops" value set to md_dst_ops,
the various ops (including mtu) are not set:crash> struct sk_buff._skb_refdst ffff928f87447700 -x
_skb_refdst = 0xffffcd6fbf5ea590
crash> struct dst_entry.ops 0xffffcd6fbf5ea590
ops = 0xffffffffa0193800
crash> struct dst_ops.mtu 0xffffffffa0193800
mtu = 0x0
crash>I confirmed that the dst entry also has dst->input set to
dst_md_discard, so it looks like it's an entry that's been
initialized via __metadata_dst_init alright.I think the fix here is to use skb_valid_dst(skb) - it checks
for DST_METADATA also, and with that fix in place, the
problem - which was previously 100% reproducible - disappears.The below patch resolves the panic and all bpf tunnel tests pass
without incident.Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Reported-by: Naresh Kamboju
Signed-off-by: Alan Maguire
Acked-by: Alexei Starovoitov
Tested-by: Anders Roxell
Reported-by: Nicolas Dichtel
Tested-by: Nicolas Dichtel
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller
28 Feb, 2019
1 commit
-
Current fib_multipath_hash_policy can make hash based on the L3 or
L4. But it only work on the outer IP. So a specific tunnel always
has the same hash value. But a specific tunnel may contain so many
inner connections.This patch provide a generic multipath_hash in floi_common. It can
make a user-define hash which can mix with L3 or L4 hash.Signed-off-by: wenxu
Signed-off-by: David S. Miller
25 Feb, 2019
1 commit
-
ip l add dev tun type gretap key 1000
Non-tunnel-dst ip tunnel device can send packet through lwtunnel
This patch provide the tun_inf dst cache support for this mode.Signed-off-by: wenxu
Signed-off-by: David S. Miller
28 Jan, 2019
1 commit
27 Jan, 2019
3 commits
-
Init the gre_key from tuninfo->key.tun_id and init the mark
from the skb->mark, set the oif to zero in the collect metadata
mode.Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu
Signed-off-by: David S. Miller -
Add tnl_update_pmtu in ip_md_tunnel_xmit to dynamic modify
the pmtu which packet send through collect_metadata mode
ip tunnelSigned-off-by: wenxu
Signed-off-by: David S. Miller -
Add ip tunnel dst cache in ip_md_tunnel_xmit to make more
efficient for the route lookup.Signed-off-by: wenxu
Signed-off-by: David S. Miller
25 Jan, 2019
1 commit
-
ip l add dev tun type gretap key 1000
ip a a dev tun 10.0.0.1/24Packets with tun-id 1000 can be recived by tun dev. But packet can't
be sent through dev tun for non-tunnel-dstWith this patch: tunnel-dst can be get through lwtunnel like beflow:
ip r a 10.0.0.7 encap ip dst 172.168.0.11 dev tunSigned-off-by: wenxu
Signed-off-by: David S. Miller
02 Jan, 2019
1 commit
-
KMSAN detected read beyond end of buffer in vti and sit devices when
passing truncated packets with PF_PACKET. The issue affects additional
ip tunnel devices.Extend commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the
inner header") and commit ccfec9e5cb2d ("ip_tunnel: be careful when
accessing the inner header").Move the check to a separate helper and call at the start of each
ndo_start_xmit function in net/ipv4 and net/ipv6.Minor changes:
- convert dev_kfree_skb to kfree_skb on error path,
as dev_kfree_skb calls consume_skb which is not for error paths.
- use pskb_network_may_pull even though that is pedantic here,
as the same as pskb_may_pull for devices without llheaders.
- do not cache ipv6 hdrs if used only once
(unsafe across pskb_may_pull, was more relevant to earlier patch)Reported-by: syzbot
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
25 Sep, 2018
1 commit
-
Cong noted that we need the same checks introduced by commit 76c0ddd8c3a6
("ip6_tunnel: be careful when accessing the inner header")
even for ipv4 tunnels.Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Cong Wang
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
08 Jun, 2018
1 commit
-
By passing a limit of 2 bytes to strncat, strncat is limited to writing
fewer bytes than what it's supposed to append to the name here.Since the bounds are checked on the line above this, just remove the string
bounds checks entirely since they're unneeded.Signed-off-by: Sultan Alsawaf
Signed-off-by: David S. Miller
02 Jun, 2018
1 commit
-
After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argumentdev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.CC: Petr Machata
CC: Ido Schimmel
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
06 Apr, 2018
1 commit
-
Use dev_valid_name() to make sure user does not provide illegal
device name.syzbot caught the following bug :
BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482CPU: 0 PID: 4482 Comm: syzkaller268107 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x1b9/0x29f lib/dump_stack.c:53
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
memcpy+0x37/0x50 mm/kasan/kasan.c:303
strlcpy include/linux/string.h:300 [inline]
__ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
ip_tunnel_create net/ipv4/ip_tunnel.c:352 [inline]
ip_tunnel_ioctl+0x818/0xd40 net/ipv4/ip_tunnel.c:861
ipip_tunnel_ioctl+0x1c5/0x420 net/ipv4/ipip.c:350
dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
sock_ioctl+0x47e/0x680 net/socket.c:1015
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:500 [inline]
do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
SYSC_ioctl fs/ioctl.c:708 [inline]
SyS_ioctl+0x24/0x30 fs/ioctl.c:706
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: David S. Miller
02 Apr, 2018
1 commit
-
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE2) In 'net-next' params->log_rq_size is renamed to be
params->log_rq_mtu_frames.3) In 'net-next' params->hard_mtu is added.
Signed-off-by: David S. Miller
29 Mar, 2018
2 commits
-
We want to use dev_set_mtu() regardless of how we calculate
the mtu value.Signed-off-by: David S. Miller
-
Steffen Klassert says:
====================
pull request (net): ipsec 2018-03-291) Fix a rcu_read_lock/rcu_read_unlock imbalance
in the error path of xfrm_local_error().
From Taehee Yoo.2) Some VTI MTU fixes. From Stefano Brivio.
3) Fix a too early overwritten skb control buffer
on xfrm transport mode.Please note that this pull request has a merge conflict
in net/ipv4/ip_tunnel.c.The conflict is between
commit f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
from the net tree and
commit 24fc79798b8d ("ip_tunnel: Clamp MTU to bounds on new link")
from the ipsec tree.
It can be solved as it is currently done in linux-next.
====================Signed-off-by: David S. Miller
24 Mar, 2018
1 commit
-
For tunnels created with IFLA_MTU, MTU of the netdevice is set by
rtnl_create_link() (called from rtnl_newlink()) before the device is
registered. However without IFLA_MTU that's not done.rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
via ip_tunnel_newlink() calls register_netdevice(), and that emits
NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
MTU of 0.After ip_tunnel_newlink() corrects the MTU after registering the
netdevice, but since there's no event, the listeners don't get to know
about the MTU until something else happens--such as a NETDEV_UP event.
That's not ideal.So instead of setting the MTU directly, go through dev_set_mtu(), which
takes care of distributing the necessary NETDEV_PRECHANGEMTU and
NETDEV_CHANGEMTU events.Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Petr Machata
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller
19 Mar, 2018
1 commit
-
Otherwise, it's possible to specify invalid MTU values directly
on creation of a link (via 'ip link add'). This is already
prevented on subsequent MTU changes by commit b96f9afee4eb
("ipv4/6: use core net MTU range checking").Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Stefano Brivio
Acked-by: Sabrina Dubroca
Signed-off-by: Steffen Klassert
10 Mar, 2018
1 commit
-
fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
ip6tnl0, ip6gre0) are automatically created when the corresponding
module is loaded.These tunnels are also automatically created when a new network
namespace is created, at a great cost.In many cases, netns are used for isolation purposes, and these
extra network devices are a waste of resources. We are using
thousands of netns per host, and hit the netns creation/delete
bottleneck a lot. (Many thanks to Kirill for recent work on this)Add a new sysctl so that we can opt-out from this automatic creation.
Note that these tunnels are still created for the initial namespace,
to be the least intrusive for typical setups.Tested:
lpk43:~# cat add_del_unshare.sh
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
waitlpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.shreal 0m37.521s
user 0m0.886s
sys 7m7.084s
lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.shreal 0m4.761s
user 0m0.851s
sys 1m8.343s
lpk43:~#Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Mar, 2018
1 commit
-
All of the conflicts were cases of overlapping changes.
In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.Signed-off-by: David S. Miller
28 Feb, 2018
1 commit
-
Initializing struct flowi4 is useful for drivers that need to emulate
routing decisions made by a tunnel interface. Publish the
function (appropriately renamed) so that the drivers in question don't
need to cut'n'paste it around.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
27 Feb, 2018
1 commit
-
This reverts commit 5c38bd1b82e1f76f9fa96c1e61c9897cabf1ce45.
skb->mark contains the mark the encapsulated traffic which
can result in incorrect routing decisions being made such
as routing loops if the route chosen is via tunnel itself.
The correct method should be to use tunnel->fwmark.Signed-off-by: Thomas Winter
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: Hideaki YOSHIFUJI
Signed-off-by: David S. Miller
29 Jan, 2018
1 commit
-
Signed-off-by: David S. Miller
26 Jan, 2018
1 commit
-
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at (null)"Let's add a helper to check if update_pmtu is available before calling it.
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl
CC: Xin Long
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
25 Jan, 2018
1 commit
-
This allows marks set by connmark in iptables
to be used for route lookups.Signed-off-by: Thomas Winter
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: Hideaki YOSHIFUJI
Signed-off-by: David S. Miller
14 Dec, 2017
1 commit
-
IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.Fix this by reading device mtu once and checking IPv4 minimal MTU.
This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Sep, 2017
1 commit
-
Implement exit_batch() method to dismantle more devices
per round.(rtnl_lock() ...
unregister_netdevice_many() ...
rtnl_unlock())Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfoBefore patch :
$ time ./add_del_unshare.sh
net_namespace 126 282 5504 1 2 : tunables 8 4 0 : slabdata 126 282 0real 1m38.965s
user 0m0.688s
sys 0m37.017sAfter patch:
$ time ./add_del_unshare.sh
net_namespace 135 291 5504 1 2 : tunables 8 4 0 : slabdata 135 291 0real 0m22.117s
user 0m0.728s
sys 0m35.328sSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Sep, 2017
1 commit
-
In collect_md mode, if the tun dev is down, it still can call
ip_tunnel_rcv to receive on packets, and the rx statistics increase
improperly.When the md tunnel is down, it's not neccessary to increase RX drops
for the tunnel device, packets would be recieved on fallback tunnel,
and the RX drops on fallback device will be increased as expected.Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
Cc: Pravin B Shelar
Signed-off-by: Haishuang Yan
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
09 Sep, 2017
1 commit
-
ttl and tos variables are declared and assigned, but are not used in
iptunnel_xmit() function.Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Cc: Alexei Starovoitov
Signed-off-by: Haishuang Yan
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
17 Jun, 2017
1 commit
-
When ip_tunnel_rcv fails, the tun_dst won't be freed, so call
dst_release to free it in error code path.Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
Acked-by: Eric Dumazet
Acked-by: Pravin B Shelar
Tested-by: Zhang Shengju
Signed-off-by: Haishuang Yan
Signed-off-by: David S. Miller
08 Jun, 2017
1 commit
-
Network devices can allocate reasources and private memory using
netdev_ops->ndo_init(). However, the release of these resources
can occur in one of two different places.Either netdev_ops->ndo_uninit() or netdev->destructor().
The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.netdev->destructor(), on the other hand, does not run until the
netdev references all go away.Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
it is not able to invoke netdev->destructor().This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.However, this means that the resources that would normally be released
by netdev->destructor() will not be.Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.Many drivers do not try to deal with this, and instead we have leaks.
Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().Signed-off-by: David S. Miller