17 Jan, 2021

1 commit

  • [ Upstream commit 94bcfdbff0c210b17b27615f4952cc6ece7d5f5f ]

    .dellink does not get called after .newlink fails,
    bareudp_newlink() must undo what bareudp_configure()
    has done if bareudp_link_config() fails.

    v2: call bareudp_dellink(), like bareudp_dev_create() does

    Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
    Link: https://lore.kernel.org/r/20210105190725.1736246-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     

13 Jan, 2021

2 commits

  • [ Upstream commit 10ad3e998fa0c25315f27cf3002ff8b02dc31c38 ]

    In the bareudp6_xmit_skb(), it calculates min_headroom.
    At that point, it uses struct iphdr, but it's not correct.
    So panic could occur.
    The struct ipv6hdr should be used.

    Test commands:
    ip netns add A
    ip netns add B
    ip link add veth0 netns A type veth peer name veth1 netns B
    ip netns exec A ip link set veth0 up
    ip netns exec A ip a a 2001:db8:0::1/64 dev veth0
    ip netns exec B ip link set veth1 up
    ip netns exec B ip a a 2001:db8:0::2/64 dev veth1

    for i in {10..1}
    do
    let A=$i-1
    ip netns exec A ip link add bareudp$i type bareudp dstport $i \
    ethertype 0x86dd
    ip netns exec A ip link set bareudp$i up
    ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev bareudp$i
    ip netns exec A ip -6 r a 2001:db8:$i::2 encap ip6 src \
    2001:db8:$A::1 dst 2001:db8:$A::2 via 2001:db8:$i::2 \
    dev bareudp$i

    ip netns exec B ip link add bareudp$i type bareudp dstport $i \
    ethertype 0x86dd
    ip netns exec B ip link set bareudp$i up
    ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev bareudp$i
    ip netns exec B ip -6 r a 2001:db8:$i::1 encap ip6 src \
    2001:db8:$A::2 dst 2001:db8:$A::1 via 2001:db8:$i::1 \
    dev bareudp$i
    done
    ip netns exec A ping 2001:db8:7::2

    Splat looks like:
    [ 66.436679][ C2] skbuff: skb_under_panic: text:ffffffff928614c8 len:454 put:14 head:ffff88810abb4000 data:ffff88810abb3ffa tail:0x1c0 end:0x3ec0 dev:veth0
    [ 66.441626][ C2] ------------[ cut here ]------------
    [ 66.443458][ C2] kernel BUG at net/core/skbuff.c:109!
    [ 66.445313][ C2] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 66.447606][ C2] CPU: 2 PID: 913 Comm: ping Not tainted 5.10.0+ #819
    [ 66.450251][ C2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 66.453713][ C2] RIP: 0010:skb_panic+0x15d/0x15f
    [ 66.455345][ C2] Code: 98 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 60 8b 78 93 41 57 41 56 41 55 48 8b 54 24 20 48 8b 74 24 28 e8 b5 40 f9 ff 0b 48 8b 6c 24 20 89 34 24 e8 08 c9 98 fe 8b 34 24 48 c7 c1 80
    [ 66.462314][ C2] RSP: 0018:ffff888119209648 EFLAGS: 00010286
    [ 66.464281][ C2] RAX: 0000000000000089 RBX: ffff888003159000 RCX: 0000000000000000
    [ 66.467216][ C2] RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed10232412c0
    [ 66.469768][ C2] RBP: ffff88810a53d440 R08: ffffed102328018d R09: ffffed102328018d
    [ 66.472297][ C2] R10: ffff888119400c67 R11: ffffed102328018c R12: 000000000000000e
    [ 66.474833][ C2] R13: ffff88810abb3ffa R14: 00000000000001c0 R15: 0000000000003ec0
    [ 66.477361][ C2] FS: 00007f37c0c72f00(0000) GS:ffff888119200000(0000) knlGS:0000000000000000
    [ 66.480214][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 66.482296][ C2] CR2: 000055a058808570 CR3: 000000011039e002 CR4: 00000000003706e0
    [ 66.484811][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 66.487793][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 66.490424][ C2] Call Trace:
    [ 66.491469][ C2]
    [ 66.492374][ C2] ? eth_header+0x28/0x190
    [ 66.494054][ C2] ? eth_header+0x28/0x190
    [ 66.495401][ C2] skb_push.cold.99+0x22/0x22
    [ 66.496700][ C2] eth_header+0x28/0x190
    [ 66.497867][ C2] neigh_resolve_output+0x3de/0x720
    [ 66.499615][ C2] ? __neigh_update+0x7e8/0x20a0
    [ 66.501176][ C2] __neigh_update+0x8bd/0x20a0
    [ 66.502749][ C2] ndisc_update+0x34/0xc0
    [ 66.504010][ C2] ndisc_recv_na+0x8da/0xb80
    [ 66.505041][ C2] ? pndisc_redo+0x20/0x20
    [ 66.505888][ C2] ? rcu_read_lock_sched_held+0xc0/0xc0
    [ 66.506965][ C2] ndisc_rcv+0x3a0/0x470
    [ 66.507797][ C2] icmpv6_rcv+0xad9/0x1b00
    [ 66.508645][ C2] ip6_protocol_deliver_rcu+0xcd6/0x1560
    [ 66.509719][ C2] ip6_input_finish+0x5b/0xf0
    [ 66.510615][ C2] ip6_input+0xcd/0x2d0
    [ 66.511406][ C2] ? ip6_input_finish+0xf0/0xf0
    [ 66.512327][ C2] ? rcu_read_lock_held+0x91/0xa0
    [ 66.513279][ C2] ? ip6_protocol_deliver_rcu+0x1560/0x1560
    [ 66.514414][ C2] ipv6_rcv+0xe8/0x300
    [ ... ]

    Acked-by: Guillaume Nault
    Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
    Signed-off-by: Taehee Yoo
    Link: https://lore.kernel.org/r/20201228152146.24270-1-ap420073@gmail.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit d9e44981739a96f1a468c13bbbd54ace378caf1c ]

    Like other tunneling interfaces, the bareudp doesn't need TXLOCK.
    So, It is good to set the NETIF_F_LLTX flag to improve performance and
    to avoid lockdep's false-positive warning.

    Test commands:
    ip netns add A
    ip netns add B
    ip link add veth0 netns A type veth peer name veth1 netns B
    ip netns exec A ip link set veth0 up
    ip netns exec A ip a a 10.0.0.1/24 dev veth0
    ip netns exec B ip link set veth1 up
    ip netns exec B ip a a 10.0.0.2/24 dev veth1

    for i in {2..1}
    do
    let A=$i-1
    ip netns exec A ip link add bareudp$i type bareudp \
    dstport $i ethertype ip
    ip netns exec A ip link set bareudp$i up
    ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i
    ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \
    dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i

    ip netns exec B ip link add bareudp$i type bareudp \
    dstport $i ethertype ip
    ip netns exec B ip link set bareudp$i up
    ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i
    ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \
    dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i
    done
    ip netns exec A ping 10.0.2.2

    Splat looks like:
    [ 96.992803][ T822] ============================================
    [ 96.993954][ T822] WARNING: possible recursive locking detected
    [ 96.995102][ T822] 5.10.0+ #819 Not tainted
    [ 96.995927][ T822] --------------------------------------------
    [ 96.997091][ T822] ping/822 is trying to acquire lock:
    [ 96.998083][ T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
    [ 96.999813][ T822]
    [ 96.999813][ T822] but task is already holding lock:
    [ 97.001192][ T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
    [ 97.002908][ T822]
    [ 97.002908][ T822] other info that might help us debug this:
    [ 97.004401][ T822] Possible unsafe locking scenario:
    [ 97.004401][ T822]
    [ 97.005784][ T822] CPU0
    [ 97.006407][ T822] ----
    [ 97.007010][ T822] lock(_xmit_NONE#2);
    [ 97.007779][ T822] lock(_xmit_NONE#2);
    [ 97.008550][ T822]
    [ 97.008550][ T822] *** DEADLOCK ***
    [ 97.008550][ T822]
    [ 97.010057][ T822] May be due to missing lock nesting notation
    [ 97.010057][ T822]
    [ 97.011594][ T822] 7 locks held by ping/822:
    [ 97.012426][ T822] #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00
    [ 97.014191][ T822] #1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
    [ 97.016045][ T822] #2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
    [ 97.017897][ T822] #3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
    [ 97.019684][ T822] #4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp]
    [ 97.021573][ T822] #5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
    [ 97.023424][ T822] #6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
    [ 97.025259][ T822]
    [ 97.025259][ T822] stack backtrace:
    [ 97.026349][ T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819
    [ 97.027609][ T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 97.029407][ T822] Call Trace:
    [ 97.030015][ T822] dump_stack+0x99/0xcb
    [ 97.030783][ T822] __lock_acquire.cold.77+0x149/0x3a9
    [ 97.031773][ T822] ? stack_trace_save+0x81/0xa0
    [ 97.032661][ T822] ? register_lock_class+0x1910/0x1910
    [ 97.033673][ T822] ? register_lock_class+0x1910/0x1910
    [ 97.034679][ T822] ? rcu_read_lock_sched_held+0x91/0xc0
    [ 97.035697][ T822] ? rcu_read_lock_bh_held+0xa0/0xa0
    [ 97.036690][ T822] lock_acquire+0x1b2/0x730
    [ 97.037515][ T822] ? __dev_queue_xmit+0x1f52/0x2960
    [ 97.038466][ T822] ? check_flags+0x50/0x50
    [ 97.039277][ T822] ? netif_skb_features+0x296/0x9c0
    [ 97.040226][ T822] ? validate_xmit_skb+0x29/0xb10
    [ 97.041151][ T822] _raw_spin_lock+0x30/0x70
    [ 97.041977][ T822] ? __dev_queue_xmit+0x1f52/0x2960
    [ 97.042927][ T822] __dev_queue_xmit+0x1f52/0x2960
    [ 97.043852][ T822] ? netdev_core_pick_tx+0x290/0x290
    [ 97.044824][ T822] ? mark_held_locks+0xb7/0x120
    [ 97.045712][ T822] ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
    [ 97.046824][ T822] ? __local_bh_enable_ip+0xa5/0xf0
    [ 97.047771][ T822] ? ___neigh_create+0x12a8/0x1eb0
    [ 97.048710][ T822] ? trace_hardirqs_on+0x41/0x120
    [ 97.049626][ T822] ? ___neigh_create+0x12a8/0x1eb0
    [ 97.050556][ T822] ? __local_bh_enable_ip+0xa5/0xf0
    [ 97.051509][ T822] ? ___neigh_create+0x12a8/0x1eb0
    [ 97.052443][ T822] ? check_chain_key+0x244/0x5f0
    [ 97.053352][ T822] ? rcu_read_lock_bh_held+0x56/0xa0
    [ 97.054317][ T822] ? ip_finish_output2+0x6ea/0x2020
    [ 97.055263][ T822] ? pneigh_lookup+0x410/0x410
    [ 97.056135][ T822] ip_finish_output2+0x6ea/0x2020
    [ ... ]

    Acked-by: Guillaume Nault
    Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
    Signed-off-by: Taehee Yoo
    Link: https://lore.kernel.org/r/20201228152136.24215-1-ap420073@gmail.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

06 Oct, 2020

1 commit


05 Aug, 2020

1 commit

  • It's currently possible to bridge Ethernet tunnels carrying IP
    packets directly to external interfaces without assigning them
    addresses and routes on the bridged network itself: this is the case
    for UDP tunnels bridged with a standard bridge or by Open vSwitch.

    PMTU discovery is currently broken with those configurations, because
    the encapsulation effectively decreases the MTU of the link, and
    while we are able to account for this using PMTU discovery on the
    lower layer, we don't have a way to relay ICMP or ICMPv6 messages
    needed by the sender, because we don't have valid routes to it.

    On the other hand, as a tunnel endpoint, we can't fragment packets
    as a general approach: this is for instance clearly forbidden for
    VXLAN by RFC 7348, section 4.3:

    VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may
    fragment encapsulated VXLAN packets due to the larger frame size.
    The destination VTEP MAY silently discard such VXLAN fragments.

    The same paragraph recommends that the MTU over the physical network
    accomodates for encapsulations, but this isn't a practical option for
    complex topologies, especially for typical Open vSwitch use cases.

    Further, it states that:

    Other techniques like Path MTU discovery (see [RFC1191] and
    [RFC1981]) MAY be used to address this requirement as well.

    Now, PMTU discovery already works for routed interfaces, we get
    route exceptions created by the encapsulation device as they receive
    ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
    we already rebuild those messages with the appropriate MTU and route
    them back to the sender.

    Add the missing bits for bridged cases:

    - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
    to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
    RFC 4443 section 2.4 for ICMPv6. This function is already called by
    UDP tunnels

    - a new function generating those ICMP or ICMPv6 replies. We can't
    reuse icmp_send() and icmp6_send() as we don't see the sender as a
    valid destination. This doesn't need to be generic, as we don't
    cover any other type of ICMP errors given that we only provide an
    encapsulation function to the sender

    While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
    we might receive GSO buffers here, and the passed headroom already
    includes the inner MAC length, so we don't have to account for it
    a second time (that would imply three MAC headers on the wire, but
    there are just two).

    This issue became visible while bridging IPv6 packets with 4500 bytes
    of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
    bytes of encapsulation headroom, we would advertise MTU as 3950, and
    we would reject fragmented IPv6 datagrams of 3958 bytes size on the
    wire. We're exclusively dealing with network MTU here, though, so we
    could get Ethernet frames up to 3964 octets in that case.

    v2:
    - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
    - split IPv4/IPv6 functions (David Ahern)

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

02 Aug, 2020

1 commit


29 Jul, 2020

1 commit

  • In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and
    IPv6 packets regardless of the bareudp ethertype. In practice, this
    let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send
    IPv6 packets.

    We need to restrict the test further, so that the multiproto mode only
    enables
    * IPv6 for IPv4 tunnels,
    * or multicast MPLS for unicast MPLS tunnels.

    To improve clarity, the protocol validation is moved to its own
    function, where each logical test has its own condition.

    v2: s/ntohs/htons/

    Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

22 Jul, 2020

1 commit

  • The commit fe80536acf83 ("bareudp: Added attribute to enable & disable
    rx metadata collection") breaks the the original(5.7) default behavior of
    bareudp module to collect RX metadadata at the receive. It was added to
    avoid the crash at the kernel neighbour subsytem when packet with metadata
    from bareudp is processed. But it is no more needed as the
    commit 394de110a733 ("net: Added pointer check for
    dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash.

    Fixes: fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection")
    Signed-off-by: Martin Varghese
    Acked-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Martin Varghese
     

29 Jun, 2020

1 commit


19 Jun, 2020

1 commit


17 Jun, 2020

1 commit


08 May, 2020

1 commit

  • clang points out that building without IPv6 would lead to returning
    an uninitialized variable if a packet with family!=AF_INET is
    passed into bareudp_udp_encap_recv():

    drivers/net/bareudp.c:139:6: error: variable 'err' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
    if (family == AF_INET)
    ^~~~~~~~~~~~~~~~~
    drivers/net/bareudp.c:146:15: note: uninitialized use occurs here
    if (unlikely(err)) {
    ^~~
    include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
    # define unlikely(x) __builtin_expect(!!(x), 0)
    ^
    drivers/net/bareudp.c:139:2: note: remove the 'if' if its condition is always true
    if (family == AF_INET)
    ^~~~~~~~~~~~~~~~~~~~~~

    This cannot happen in practice, so change the condition in a way that
    gcc sees the IPv4 case as unconditionally true here.
    For consistency, change all the similar constructs in this file the
    same way, using "if(IS_ENABLED())" instead of #if IS_ENABLED()".

    Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Nathan Chancellor
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

12 Mar, 2020

1 commit

  • Reverted commit "2baecda bareudp: remove unnecessary udp_encap_enable() in
    bareudp_socket_create()"

    An explicit call to udp_encap_enable is needed as the setup_udp_tunnel_sock
    does not call udp_encap_enable if the if the socket is of type v6.

    Bareudp device uses v6 socket to receive v4 & v6 traffic

    CC: Taehee Yoo
    Fixes: 2baecda37f4e ("bareudp: remove unnecessary udp_encap_enable() in bareudp_socket_create()")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller

    Martin Varghese
     

09 Mar, 2020

3 commits


25 Feb, 2020

3 commits

  • drivers/net/bareudp.c: In function 'bareudp_xmit_skb':
    drivers/net/bareudp.c:346:9: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized]
    346 | return err;
    | ^~~
    drivers/net/bareudp.c: In function 'bareudp6_xmit_skb':
    drivers/net/bareudp.c:407:9: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized]
    407 | return err;

    Reported-by: Stephen Rothwell
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Special handling is needed in bareudp module for IP & MPLS as they
    support more than one ethertypes.

    MPLS has 2 ethertypes. 0x8847 for MPLS unicast and 0x8848 for MPLS multicast.
    While decapsulating MPLS packet from UDP packet the tunnel destination IP
    address is checked to determine the ethertype. The ethertype of the packet
    will be set to 0x8848 if the tunnel destination IP address is a multicast
    IP address. The ethertype of the packet will be set to 0x8847 if the
    tunnel destination IP address is a unicast IP address.

    IP has 2 ethertypes.0x0800 for IPV4 and 0x86dd for IPv6. The version
    field of the IP header tunnelled will be checked to determine the ethertype.

    This special handling to tunnel additional ethertypes will be disabled
    by default and can be enabled using a flag called multiproto. This flag can
    be used only with ethertypes 0x8847 and 0x0800.

    Signed-off-by: Martin Varghese
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Martin Varghese
     
  • The Bareudp tunnel module provides a generic L3 encapsulation
    tunnelling module for tunnelling different protocols like MPLS,
    IP,NSH etc inside a UDP tunnel.

    Signed-off-by: Martin Varghese
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Martin Varghese