17 Jan, 2021

1 commit

  • [ Upstream commit 50c661670f6a3908c273503dfa206dfc7aa54c07 ]

    For some reason ip_tunnel insist on setting the DF bit anyway when the
    inner header has the DF bit set, EVEN if the tunnel was configured with
    'nopmtudisc'.

    This means that the script added in the previous commit
    cannot be made to work by adding the 'nopmtudisc' flag to the
    ip tunnel configuration. Doing so breaks connectivity even for the
    without-conntrack/netfilter scenario.

    When nopmtudisc is set, the tunnel will skip the mtu check, so no
    icmp error is sent to client. Then, because inner header has DF set,
    the outer header gets added with DF bit set as well.

    IP stack then sends an error to itself because the packet exceeds
    the device MTU.

    Fixes: 23a3647bc4f93 ("ip_tunnels: Use skb-len to PMTU check.")
    Cc: Stefano Brivio
    Signed-off-by: Florian Westphal
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     

01 Nov, 2020

1 commit

  • The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
    the outer df only based TUNNEL_DONT_FRAGMENT.
    And this was also the behavior for gre device before switching to use
    ip_md_tunnel_xmit in commit 962924fa2b7a ("ip_gre: Refactor collect
    metatdata mode tunnel xmit to ip_md_tunnel_xmit")

    When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
    make the discrepancy between handling of DF by different tunnels. So in the
    ip_md_tunnel_xmit should follow the same rule like other tunnels.

    Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
    Signed-off-by: wenxu
    Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cn
    Signed-off-by: Jakub Kicinski

    wenxu
     

06 Oct, 2020

1 commit


19 Jun, 2020

1 commit

  • In the datapath, the ip_tunnel_lookup() is used and it internally uses
    fallback tunnel device pointer, which is fb_tunnel_dev.
    This pointer variable should be set to NULL when a fb interface is deleted.
    But there is no routine to set fb_tunnel_dev pointer to NULL.
    So, this pointer will be still used after interface is deleted and
    it eventually results in the use-after-free problem.

    Test commands:
    ip netns add A
    ip netns add B
    ip link add eth0 type veth peer name eth1
    ip link set eth0 netns A
    ip link set eth1 netns B

    ip netns exec A ip link set lo up
    ip netns exec A ip link set eth0 up
    ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
    remote 10.0.0.2
    ip netns exec A ip link set gre1 up
    ip netns exec A ip a a 10.0.100.1/24 dev gre1
    ip netns exec A ip a a 10.0.0.1/24 dev eth0

    ip netns exec B ip link set lo up
    ip netns exec B ip link set eth1 up
    ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
    remote 10.0.0.1
    ip netns exec B ip link set gre1 up
    ip netns exec B ip a a 10.0.100.2/24 dev gre1
    ip netns exec B ip a a 10.0.0.2/24 dev eth1
    ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
    ip netns del B

    Splat looks like:
    [ 77.793450][ C3] ==================================================================
    [ 77.794702][ C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
    [ 77.795573][ C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
    [ 77.796398][ C3]
    [ 77.796664][ C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
    [ 77.797474][ C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 77.798453][ C3] Call Trace:
    [ 77.798815][ C3]
    [ 77.799142][ C3] dump_stack+0x9d/0xdb
    [ 77.799605][ C3] print_address_description.constprop.7+0x2cc/0x450
    [ 77.800365][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
    [ 77.800908][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
    [ 77.801517][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
    [ 77.802145][ C3] kasan_report+0x154/0x190
    [ 77.802821][ C3] ? ip_tunnel_lookup+0xcc4/0xf30
    [ 77.803503][ C3] ip_tunnel_lookup+0xcc4/0xf30
    [ 77.804165][ C3] __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
    [ 77.804862][ C3] ? rcu_read_lock_sched_held+0xc0/0xc0
    [ 77.805621][ C3] gre_rcv+0x304/0x1910 [ip_gre]
    [ 77.806293][ C3] ? lock_acquire+0x1a9/0x870
    [ 77.806925][ C3] ? gre_rcv+0xfe/0x354 [gre]
    [ 77.807559][ C3] ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
    [ 77.808305][ C3] ? rcu_read_lock_sched_held+0xc0/0xc0
    [ 77.809032][ C3] ? rcu_read_lock_held+0x90/0xa0
    [ 77.809713][ C3] gre_rcv+0x1b8/0x354 [gre]
    [ ... ]

    Suggested-by: Eric Dumazet
    Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

20 May, 2020

1 commit

  • This method is used to properly allow kernel callers of the IPv4 route
    management ioctls. The exsting ip_tunnel_ioctl helper is renamed to
    ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls
    touching user memory, and is used for the guts of ndo_tunnel_ctl
    implementations. A new ip_tunnel_ioctl helper is added that can be wired
    up directly to the ndo_do_ioctl method and takes care of the copy to and
    from userspace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

30 Mar, 2020

1 commit

  • when creating a new ipip interface with no local/remote configuration,
    the lookup is done with TUNNEL_NO_KEY flag, making it impossible to
    match the new interface (only possible match being fallback or metada
    case interface); e.g: `ip link add tunl1 type ipip dev eth0`

    To fix this case, adding a flag check before the key comparison so we
    permit to match an interface with no local/remote config; it also avoids
    breaking possible userland tools relying on TUNNEL_NO_KEY flag and
    uninitialised key.

    context being on my side, I'm creating an extra ipip interface attached
    to the physical one, and moving it to a dedicated namespace.

    Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
    Signed-off-by: William Dauchy
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    William Dauchy
     

21 Jan, 2020

1 commit

  • in the same manner as commit 690afc165bb3 ("net: ip6_gre: fix moving
    ip6gre between namespaces"), fix namespace moving as it was broken since
    commit 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.").
    Indeed, the ip6_gre commit removed the local flag for collect_md
    condition, so there is no reason to keep it for ip_gre/ip_tunnel.

    this patch will fix both ip_tunnel and ip_gre modules.

    Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
    Signed-off-by: William Dauchy
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    William Dauchy
     

25 Dec, 2019

1 commit

  • When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
    we should not call dst_confirm_neigh() as there is no two-way communication.

    v5: No Change.
    v4: Update commit description
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
    Reviewed-by: Guillaume Nault
    Tested-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not write to the free
    software foundation inc 51 franklin street fifth floor boston ma
    02110 1301 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 21 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Mar, 2019

1 commit

  • Naresh Kamboju noted the following oops during execution of selftest
    tools/testing/selftests/bpf/test_tunnel.sh on x86_64:

    [ 274.120445] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000000
    [ 274.128285] #PF error: [INSTR]
    [ 274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0
    [ 274.138241] Oops: 0010 [#1] SMP PTI
    [ 274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted
    5.0.0-rc4-next-20190129 #1
    [ 274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
    2.0b 07/27/2017
    [ 274.156526] RIP: 0010: (null)
    [ 274.160280] Code: Bad RIP value.
    [ 274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
    [ 274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
    [ 274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
    [ 274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
    [ 274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
    [ 274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
    [ 274.204346] FS: 00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
    knlGS:0000000000000000
    [ 274.212424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
    [ 274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 274.239541] Call Trace:
    [ 274.241988] ? tnl_update_pmtu+0x296/0x3b0
    [ 274.246085] ip_md_tunnel_xmit+0x1bc/0x520
    [ 274.250176] gre_fb_xmit+0x330/0x390
    [ 274.253754] gre_tap_xmit+0x128/0x180
    [ 274.257414] dev_hard_start_xmit+0xb7/0x300
    [ 274.261598] sch_direct_xmit+0xf6/0x290
    [ 274.265430] __qdisc_run+0x15d/0x5e0
    [ 274.269007] __dev_queue_xmit+0x2c5/0xc00
    [ 274.273011] ? dev_queue_xmit+0x10/0x20
    [ 274.276842] ? eth_header+0x2b/0xc0
    [ 274.280326] dev_queue_xmit+0x10/0x20
    [ 274.283984] ? dev_queue_xmit+0x10/0x20
    [ 274.287813] arp_xmit+0x1a/0xf0
    [ 274.290952] arp_send_dst.part.19+0x46/0x60
    [ 274.295138] arp_solicit+0x177/0x6b0
    [ 274.298708] ? mod_timer+0x18e/0x440
    [ 274.302281] neigh_probe+0x57/0x70
    [ 274.305684] __neigh_event_send+0x197/0x2d0
    [ 274.309862] neigh_resolve_output+0x18c/0x210
    [ 274.314212] ip_finish_output2+0x257/0x690
    [ 274.318304] ip_finish_output+0x219/0x340
    [ 274.322314] ? ip_finish_output+0x219/0x340
    [ 274.326493] ip_output+0x76/0x240
    [ 274.329805] ? ip_fragment.constprop.53+0x80/0x80
    [ 274.334510] ip_local_out+0x3f/0x70
    [ 274.337992] ip_send_skb+0x19/0x40
    [ 274.341391] ip_push_pending_frames+0x33/0x40
    [ 274.345740] raw_sendmsg+0xc15/0x11d0
    [ 274.349403] ? __might_fault+0x85/0x90
    [ 274.353151] ? _copy_from_user+0x6b/0xa0
    [ 274.357070] ? rw_copy_check_uvector+0x54/0x130
    [ 274.361604] inet_sendmsg+0x42/0x1c0
    [ 274.365179] ? inet_sendmsg+0x42/0x1c0
    [ 274.368937] sock_sendmsg+0x3e/0x50
    [ 274.372460] ___sys_sendmsg+0x26f/0x2d0
    [ 274.376293] ? lock_acquire+0x95/0x190
    [ 274.380043] ? __handle_mm_fault+0x7ce/0xb70
    [ 274.384307] ? lock_acquire+0x95/0x190
    [ 274.388053] ? __audit_syscall_entry+0xdd/0x130
    [ 274.392586] ? ktime_get_coarse_real_ts64+0x64/0xc0
    [ 274.397461] ? __audit_syscall_entry+0xdd/0x130
    [ 274.401989] ? trace_hardirqs_on+0x4c/0x100
    [ 274.406173] __sys_sendmsg+0x63/0xa0
    [ 274.409744] ? __sys_sendmsg+0x63/0xa0
    [ 274.413488] __x64_sys_sendmsg+0x1f/0x30
    [ 274.417405] do_syscall_64+0x55/0x190
    [ 274.421064] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 274.426113] RIP: 0033:0x7ff4ae0e6e87
    [ 274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00
    00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00
    00 0f 05 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c
    24 08
    [ 274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX:
    000000000000002e
    [ 274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87
    [ 274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003
    [ 274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080
    [ 274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003
    [ 274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4
    [ 274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter
    ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded:
    test_bpf]
    [ 274.504634] CR2: 0000000000000000
    [ 274.507976] ---[ end trace 196d18386545eae1 ]---
    [ 274.512588] RIP: 0010: (null)
    [ 274.516334] Code: Bad RIP value.
    [ 274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
    [ 274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
    [ 274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
    [ 274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
    [ 274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
    [ 274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
    [ 274.560456] FS: 00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
    knlGS:0000000000000000
    [ 274.568541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
    [ 274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 274.595658] Kernel panic - not syncing: Fatal exception in interrupt
    [ 274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000
    (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [ 274.612827] ---[ end Kernel panic - not syncing: Fatal exception in
    interrupt ]---
    [ 274.620387] ------------[ cut here ]------------

    I'm also seeing the same failure on x86_64, and it reproduces
    consistently.

    >From poking around it looks like the skb's dst entry is being used
    to calculate the mtu in:

    mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;

    ...but because that dst_entry has an "ops" value set to md_dst_ops,
    the various ops (including mtu) are not set:

    crash> struct sk_buff._skb_refdst ffff928f87447700 -x
    _skb_refdst = 0xffffcd6fbf5ea590
    crash> struct dst_entry.ops 0xffffcd6fbf5ea590
    ops = 0xffffffffa0193800
    crash> struct dst_ops.mtu 0xffffffffa0193800
    mtu = 0x0
    crash>

    I confirmed that the dst entry also has dst->input set to
    dst_md_discard, so it looks like it's an entry that's been
    initialized via __metadata_dst_init alright.

    I think the fix here is to use skb_valid_dst(skb) - it checks
    for DST_METADATA also, and with that fix in place, the
    problem - which was previously 100% reproducible - disappears.

    The below patch resolves the panic and all bpf tunnel tests pass
    without incident.

    Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
    Reported-by: Naresh Kamboju
    Signed-off-by: Alan Maguire
    Acked-by: Alexei Starovoitov
    Tested-by: Anders Roxell
    Reported-by: Nicolas Dichtel
    Tested-by: Nicolas Dichtel
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Alan Maguire
     

28 Feb, 2019

1 commit

  • Current fib_multipath_hash_policy can make hash based on the L3 or
    L4. But it only work on the outer IP. So a specific tunnel always
    has the same hash value. But a specific tunnel may contain so many
    inner connections.

    This patch provide a generic multipath_hash in floi_common. It can
    make a user-define hash which can mix with L3 or L4 hash.

    Signed-off-by: wenxu
    Signed-off-by: David S. Miller

    wenxu
     

25 Feb, 2019

1 commit


28 Jan, 2019

1 commit


27 Jan, 2019

3 commits


25 Jan, 2019

1 commit

  • ip l add dev tun type gretap key 1000
    ip a a dev tun 10.0.0.1/24

    Packets with tun-id 1000 can be recived by tun dev. But packet can't
    be sent through dev tun for non-tunnel-dst

    With this patch: tunnel-dst can be get through lwtunnel like beflow:
    ip r a 10.0.0.7 encap ip dst 172.168.0.11 dev tun

    Signed-off-by: wenxu
    Signed-off-by: David S. Miller

    wenxu
     

02 Jan, 2019

1 commit

  • KMSAN detected read beyond end of buffer in vti and sit devices when
    passing truncated packets with PF_PACKET. The issue affects additional
    ip tunnel devices.

    Extend commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the
    inner header") and commit ccfec9e5cb2d ("ip_tunnel: be careful when
    accessing the inner header").

    Move the check to a separate helper and call at the start of each
    ndo_start_xmit function in net/ipv4 and net/ipv6.

    Minor changes:
    - convert dev_kfree_skb to kfree_skb on error path,
    as dev_kfree_skb calls consume_skb which is not for error paths.
    - use pskb_network_may_pull even though that is pedantic here,
    as the same as pskb_may_pull for devices without llheaders.
    - do not cache ipv6 hdrs if used only once
    (unsafe across pskb_may_pull, was more relevant to earlier patch)

    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

25 Sep, 2018

1 commit

  • Cong noted that we need the same checks introduced by commit 76c0ddd8c3a6
    ("ip6_tunnel: be careful when accessing the inner header")
    even for ipv4 tunnels.

    Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
    Suggested-by: Cong Wang
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

08 Jun, 2018

1 commit


02 Jun, 2018

1 commit

  • After commit f6cc9c054e77, the following conf is broken (note that the
    default loopback mtu is 65536, ie IP_MAX_MTU + 1):

    $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
    add tunnel "gre0" failed: Invalid argument
    $ ip l a type dummy
    $ ip l s dummy1 up
    $ ip l s dummy1 mtu 65535
    $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
    add tunnel "gre0" failed: Invalid argument

    dev_set_mtu() doesn't allow to set a mtu which is too large.
    First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
    the magic value 0xFFF8 and use IP_MAX_MTU instead.
    0xFFF8 seems to be there for ages, I don't know why this value was used.

    With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
    $ ip l s dummy1 mtu 66000
    After that patch, it's also possible to bind an ip tunnel on that kind of
    interface.

    CC: Petr Machata
    CC: Ido Schimmel
    Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
    Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
    Signed-off-by: Nicolas Dichtel
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

06 Apr, 2018

1 commit

  • Use dev_valid_name() to make sure user does not provide illegal
    device name.

    syzbot caught the following bug :

    BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
    BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
    Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482

    CPU: 0 PID: 4482 Comm: syzkaller268107 Not tainted 4.16.0+ #1
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x1b9/0x29f lib/dump_stack.c:53
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
    check_memory_region_inline mm/kasan/kasan.c:260 [inline]
    check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
    memcpy+0x37/0x50 mm/kasan/kasan.c:303
    strlcpy include/linux/string.h:300 [inline]
    __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
    ip_tunnel_create net/ipv4/ip_tunnel.c:352 [inline]
    ip_tunnel_ioctl+0x818/0xd40 net/ipv4/ip_tunnel.c:861
    ipip_tunnel_ioctl+0x1c5/0x420 net/ipv4/ipip.c:350
    dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
    dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
    sock_ioctl+0x47e/0x680 net/socket.c:1015
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:500 [inline]
    do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
    ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
    SYSC_ioctl fs/ioctl.c:708 [inline]
    SyS_ioctl+0x24/0x30 fs/ioctl.c:706
    do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Apr, 2018

1 commit


29 Mar, 2018

2 commits

  • We want to use dev_set_mtu() regardless of how we calculate
    the mtu value.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2018-03-29

    1) Fix a rcu_read_lock/rcu_read_unlock imbalance
    in the error path of xfrm_local_error().
    From Taehee Yoo.

    2) Some VTI MTU fixes. From Stefano Brivio.

    3) Fix a too early overwritten skb control buffer
    on xfrm transport mode.

    Please note that this pull request has a merge conflict
    in net/ipv4/ip_tunnel.c.

    The conflict is between

    commit f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")

    from the net tree and

    commit 24fc79798b8d ("ip_tunnel: Clamp MTU to bounds on new link")

    from the ipsec tree.

    It can be solved as it is currently done in linux-next.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Mar, 2018

1 commit

  • For tunnels created with IFLA_MTU, MTU of the netdevice is set by
    rtnl_create_link() (called from rtnl_newlink()) before the device is
    registered. However without IFLA_MTU that's not done.

    rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
    via ip_tunnel_newlink() calls register_netdevice(), and that emits
    NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
    MTU of 0.

    After ip_tunnel_newlink() corrects the MTU after registering the
    netdevice, but since there's no event, the listeners don't get to know
    about the MTU until something else happens--such as a NETDEV_UP event.
    That's not ideal.

    So instead of setting the MTU directly, go through dev_set_mtu(), which
    takes care of distributing the necessary NETDEV_PRECHANGEMTU and
    NETDEV_CHANGEMTU events.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Petr Machata
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

19 Mar, 2018

1 commit

  • Otherwise, it's possible to specify invalid MTU values directly
    on creation of a link (via 'ip link add'). This is already
    prevented on subsequent MTU changes by commit b96f9afee4eb
    ("ipv4/6: use core net MTU range checking").

    Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
    Signed-off-by: Stefano Brivio
    Acked-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Stefano Brivio
     

10 Mar, 2018

1 commit

  • fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
    ip6tnl0, ip6gre0) are automatically created when the corresponding
    module is loaded.

    These tunnels are also automatically created when a new network
    namespace is created, at a great cost.

    In many cases, netns are used for isolation purposes, and these
    extra network devices are a waste of resources. We are using
    thousands of netns per host, and hit the netns creation/delete
    bottleneck a lot. (Many thanks to Kirill for recent work on this)

    Add a new sysctl so that we can opt-out from this automatic creation.

    Note that these tunnels are still created for the initial namespace,
    to be the least intrusive for typical setups.

    Tested:
    lpk43:~# cat add_del_unshare.sh
    for i in `seq 1 40`
    do
    (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
    done
    wait

    lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
    lpk43:~# time ./add_del_unshare.sh

    real 0m37.521s
    user 0m0.886s
    sys 7m7.084s
    lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
    lpk43:~# time ./add_del_unshare.sh

    real 0m4.761s
    user 0m0.851s
    sys 1m8.343s
    lpk43:~#

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Mar, 2018

1 commit


28 Feb, 2018

1 commit

  • Initializing struct flowi4 is useful for drivers that need to emulate
    routing decisions made by a tunnel interface. Publish the
    function (appropriately renamed) so that the drivers in question don't
    need to cut'n'paste it around.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     

27 Feb, 2018

1 commit

  • This reverts commit 5c38bd1b82e1f76f9fa96c1e61c9897cabf1ce45.

    skb->mark contains the mark the encapsulated traffic which
    can result in incorrect routing decisions being made such
    as routing loops if the route chosen is via tunnel itself.
    The correct method should be to use tunnel->fwmark.

    Signed-off-by: Thomas Winter
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: Hideaki YOSHIFUJI
    Signed-off-by: David S. Miller

    Thomas Winter
     

29 Jan, 2018

1 commit


26 Jan, 2018

1 commit

  • Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
    "BUG: unable to handle kernel NULL pointer dereference at (null)"

    Let's add a helper to check if update_pmtu is available before calling it.

    Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
    Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
    CC: Roman Kapl
    CC: Xin Long
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

25 Jan, 2018

1 commit


14 Dec, 2017

1 commit

  • IPv4 stack reacts to changes to small MTU, by disabling itself under
    RTNL.

    But there is a window where threads not using RTNL can see a wrong
    device mtu. This can lead to surprises, in igmp code where it is
    assumed the mtu is suitable.

    Fix this by reading device mtu once and checking IPv4 minimal MTU.

    This patch adds missing IPV4_MIN_MTU define, to not abuse
    ETH_MIN_MTU anymore.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Sep, 2017

1 commit

  • Implement exit_batch() method to dismantle more devices
    per round.

    (rtnl_lock() ...
    unregister_netdevice_many() ...
    rtnl_unlock())

    Tested:
    $ cat add_del_unshare.sh
    for i in `seq 1 40`
    do
    (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
    done
    wait ; grep net_namespace /proc/slabinfo

    Before patch :
    $ time ./add_del_unshare.sh
    net_namespace 126 282 5504 1 2 : tunables 8 4 0 : slabdata 126 282 0

    real 1m38.965s
    user 0m0.688s
    sys 0m37.017s

    After patch:
    $ time ./add_del_unshare.sh
    net_namespace 135 291 5504 1 2 : tunables 8 4 0 : slabdata 135 291 0

    real 0m22.117s
    user 0m0.728s
    sys 0m35.328s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Sep, 2017

1 commit

  • In collect_md mode, if the tun dev is down, it still can call
    ip_tunnel_rcv to receive on packets, and the rx statistics increase
    improperly.

    When the md tunnel is down, it's not neccessary to increase RX drops
    for the tunnel device, packets would be recieved on fallback tunnel,
    and the RX drops on fallback device will be increased as expected.

    Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
    Cc: Pravin B Shelar
    Signed-off-by: Haishuang Yan
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Haishuang Yan
     

09 Sep, 2017

1 commit


17 Jun, 2017

1 commit

  • When ip_tunnel_rcv fails, the tun_dst won't be freed, so call
    dst_release to free it in error code path.

    Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
    Acked-by: Eric Dumazet
    Acked-by: Pravin B Shelar
    Tested-by: Zhang Shengju
    Signed-off-by: Haishuang Yan
    Signed-off-by: David S. Miller

    Haishuang Yan
     

08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller