01 Apr, 2020

1 commit


31 Mar, 2020

4 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Refactor the UDP/TCP handlers slightly to allow skb_steal_sock() to make
    the determination of whether the socket is reference counted in the case
    where it is prefetched by earlier logic such as early_demux.

    Signed-off-by: Joe Stringer
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200329225342.16317-3-joe@wand.net.nz

    Joe Stringer
     
  • Add support for TPROXY via a new bpf helper, bpf_sk_assign().

    This helper requires the BPF program to discover the socket via a call
    to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
    helper takes its own reference to the socket in addition to any existing
    reference that may or may not currently be obtained for the duration of
    BPF processing. For the destination socket to receive the traffic, the
    traffic must be routed towards that socket via local route. The
    simplest example route is below, but in practice you may want to route
    traffic more narrowly (eg by CIDR):

    $ ip route add local default dev lo

    This patch avoids trying to introduce an extra bit into the skb->sk, as
    that would require more invasive changes to all code interacting with
    the socket to ensure that the bit is handled correctly, such as all
    error-handling cases along the path from the helper in BPF through to
    the orphan path in the input. Instead, we opt to use the destructor
    variable to switch on the prefetch of the socket.

    Signed-off-by: Joe Stringer
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz

    Joe Stringer
     
  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2020-03-28

    1) Use kmem_cache_zalloc() instead of kmem_cache_alloc()
    in xfrm_state_alloc(). From Huang Zijiang.

    2) esp_output_fill_trailer() is the same in IPv4 and IPv6,
    so share this function to avoide code duplcation.
    From Raed Salem.

    3) Add offload support for esp beet mode.
    From Xin Long.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Mar, 2020

5 commits

  • This patch adds functionality to configure routes for RPL source routing
    functionality. There is no IPIP functionality yet implemented which can
    be added later when the cases when to use IPv6 encapuslation comes more
    clear.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • The build_state callback of lwtunnel doesn't contain the net namespace
    structure yet. This patch will add it so we can check on specific
    address configuration at creation time of rpl source routes.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • This patch adds rpl source routing receive handling. Everything works
    only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
    the same behaviour as IPv6 segmentation routing. To handle compression
    and uncompression a rpl.c file is created which contains the necessary
    functionality. The receive handling will also care about IPv6
    encapsulated so far it's specified as possible nexthdr in RFC 6554.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • This patch adds a functionality to addrconf to check on a specific RPL
    address configuration. According to RFC 6554:

    To detect loops in the SRH, a router MUST determine if the SRH
    includes multiple addresses assigned to any interface on that
    router. If such addresses appear more than once and are separated by
    at least one address not assigned to that router.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • Minor comment conflict in mac80211.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Mar, 2020

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2020-03-27

    1) Handle NETDEV_UNREGISTER for xfrm device to handle asynchronous
    unregister events cleanly. From Raed Salem.

    2) Fix vti6 tunnel inter address family TX through bpf_redirect().
    From Nicolas Dichtel.

    3) Fix lenght check in verify_sec_ctx_len() to avoid a
    slab-out-of-bounds. From Xin Long.

    4) Add a missing verify_sec_ctx_len check in xfrm_add_acquire
    to avoid a possible out-of-bounds to access. From Xin Long.

    5) Use built-in RCU list checking of hlist_for_each_entry_rcu
    to silence false lockdep warning in __xfrm6_tunnel_spi_lookup
    when CONFIG_PROVE_RCU_LIST is enabled. From Madhuparna Bhowmik.

    6) Fix a panic on esp offload when crypto is done asynchronously.
    From Xin Long.

    7) Fix a skb memory leak in an error path of vti6_rcv.
    From Torsten Hilbrich.

    8) Fix a race that can lead to a doulbe free in xfrm_policy_timer.
    From Xin Long.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Mar, 2020

1 commit

  • This is trivial since we already have support for the entirely
    identical (from the kernel's point of view) RDNSS, DNSSL, etc. that
    also contain opaque data that needs to be passed down to userspace
    for further processing.

    As specified in draft-ietf-6man-ra-pref64-09 (while it is still a draft,
    it is purely waiting on the RFC Editor for cleanups and publishing):
    PREF64 option contains lifetime and a (up to) 96-bit IPv6 prefix.

    The 8-bit identifier of the option type as assigned by the IANA is 38.

    Since we lack DNS64/NAT64/CLAT support in kernel at the moment,
    thus this option should also be passed on to userland.

    See:
    https://tools.ietf.org/html/draft-ietf-6man-ra-pref64-09
    https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml#icmpv6-parameters-5

    Cc: Erik Kline
    Cc: Jen Linkova
    Cc: Lorenzo Colitti
    Cc: Michael Haro
    Signed-off-by: Maciej Żenczykowski
    Acked-By: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

26 Mar, 2020

1 commit

  • Similar to xfrm6_tunnel/transport_gso_segment(), _gso_segment()
    is added to do gso_segment for esp6 beet mode. Before calling
    inet6_offloads[proto]->callbacks.gso_segment, it needs to do:

    - Get the upper proto from ph header to get its gso_segment
    when xo->proto is IPPROTO_BEETPH.

    - Add SKB_GSO_TCPV6 to gso_type if x->sel.family != AF_INET6
    and the proto == IPPROTO_TCP, so that the current tcp ipv6
    packet can be segmented.

    - Calculate a right value for skb->transport_header and move
    skb->data to the transport header position.

    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert

    Xin Long
     

24 Mar, 2020

1 commit

  • Previous changes to the IP routing code have removed all the
    tests for the DS_HOST route flag.
    Remove the flags and all the code that sets it.

    Signed-off-by: David Laight
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    David Laight
     

16 Mar, 2020

1 commit

  • The vti6_rcv function performs some tests on the retrieved tunnel
    including checking the IP protocol, the XFRM input policy, the
    source and destination address.

    In all but one places the skb is released in the error case. When
    the input policy check fails the network packet is leaked.

    Using the same goto-label discard in this case to fix this problem.

    Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces")
    Signed-off-by: Torsten Hilbrich
    Reviewed-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Torsten Hilbrich
     

15 Mar, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    Lastly, fix checkpatch.pl warning
    WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
    in net/bridge/netfilter/ebtables.c

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     

13 Mar, 2020

2 commits

  • Minor overlapping changes, nothing serious.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Convert the various uses of fallthrough comments to fallthrough;

    Done via script
    Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

    And by hand:

    net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
    that causes gcc to emit a warning if converted in-place.

    So move the new fallthrough; inside the containing #ifdef/#endif too.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

12 Mar, 2020

1 commit

  • The Internet Assigned Numbers Authority (IANA) has recently assigned
    a protocol number value of 143 for Ethernet [1].

    Before this assignment, encapsulation mechanisms such as Segment Routing
    used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
    payload is an Ethernet frame.

    In this patch, we add the definition of the Ethernet protocol number to the
    kernel headers and update the SRv6 L2 tunnels to use it.

    [1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml

    Signed-off-by: Paolo Lungaroni
    Reviewed-by: Andrea Mayer
    Acked-by: Ahmed Abdelsalam
    Signed-off-by: David S. Miller

    Paolo Lungaroni
     

11 Mar, 2020

1 commit

  • Rafał found an issue that for non-Ethernet interface, if we down and up
    frequently, the memory will be consumed slowly.

    The reason is we add allnodes/allrouters addressed in multicast list in
    ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
    addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
    for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
    getting bigger and bigger. The call stack looks like:

    addrconf_notify(NETDEV_REGISTER)
    ipv6_add_dev
    ipv6_dev_mc_inc(ff01::1)
    ipv6_dev_mc_inc(ff02::1)
    ipv6_dev_mc_inc(ff02::2)

    addrconf_notify(NETDEV_UP)
    addrconf_dev_config
    /* Alas, we support only Ethernet autoconfiguration. */
    return;

    addrconf_notify(NETDEV_DOWN)
    addrconf_ifdown
    ipv6_mc_down
    igmp6_group_dropped(ff02::2)
    mld_add_delrec(ff02::2)
    igmp6_group_dropped(ff02::1)
    igmp6_group_dropped(ff01::1)

    After investigating, I can't found a rule to disable multicast on
    non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
    tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
    in inetdev_event(). Even for IPv6, we don't check the dev type and call
    ipv6_add_dev(), ipv6_dev_mc_inc() after register device.

    So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
    non-Ethernet interface.

    v2: Also check IFF_MULTICAST flag to make sure the interface supports
    multicast

    Reported-by: Rafał Miłecki
    Tested-by: Rafał Miłecki
    Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
    Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down")
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

04 Mar, 2020

3 commits

  • The data pointers of ipv6 sysctl are set one by one which is hard to
    maintain, especially with kconfig. This patch simplifies it by using
    math to point the per net sysctls into the appropriate struct net,
    just like what we did for ipv4.

    Signed-off-by: Cambda Zhu
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Cambda Zhu
     
  • When we modify the peer route and changed it to a new one, we should
    remove the old route first. Before the fix:

    + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2
    + ip -6 route show dev dummy1
    2001:db8::1 proto kernel metric 256 pref medium
    2001:db8::2 proto kernel metric 256 pref medium
    + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
    + ip -6 route show dev dummy1
    2001:db8::1 proto kernel metric 256 pref medium
    2001:db8::2 proto kernel metric 256 pref medium

    After the fix:
    + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
    + ip -6 route show dev dummy1
    2001:db8::1 proto kernel metric 256 pref medium
    2001:db8::3 proto kernel metric 256 pref medium

    This patch depend on the previous patch "net/ipv6: need update peer route
    when modify metric" to update new peer route after delete old one.

    Signed-off-by: Hangbin Liu
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • When we modify the route metric, the peer address's route need also
    be updated. Before the fix:

    + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60
    + ip -6 route show dev dummy1
    2001:db8::1 proto kernel metric 60 pref medium
    2001:db8::2 proto kernel metric 60 pref medium
    + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
    + ip -6 route show dev dummy1
    2001:db8::1 proto kernel metric 61 pref medium
    2001:db8::2 proto kernel metric 60 pref medium

    After the fix:
    + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
    + ip -6 route show dev dummy1
    2001:db8::1 proto kernel metric 61 pref medium
    2001:db8::2 proto kernel metric 61 pref medium

    Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes")
    Signed-off-by: Hangbin Liu
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Hangbin Liu
     

01 Mar, 2020

1 commit

  • When we add peer address with metric configured, IPv4 could set the dest
    metric correctly, but IPv6 do not. e.g.

    ]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20
    ]# ip route show dev eth1
    192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20
    ]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20
    ]# ip -6 route show dev eth1
    2001:db8::1 proto kernel metric 20 pref medium
    2001:db8::2 proto kernel metric 256 pref medium

    Fix this by using configured metric instead of default one.

    Reported-by: Jianlin Shi
    Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes")
    Reviewed-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

29 Feb, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

28 Feb, 2020

1 commit


27 Feb, 2020

2 commits

  • hlist_for_each_entry_rcu() has built-in RCU and lock checking.

    Pass cond argument to list_for_each_entry_rcu() to silence
    false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
    by default.

    Signed-off-by: Madhuparna Bhowmik
    Signed-off-by: Steffen Klassert

    Madhuparna Bhowmik
     
  • IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
    While this operation sounds illogical, we have to support it.

    One of the things it does for TCP socket is to switch sk->sk_prot
    to tcp_prot.

    We now have other layers playing with sk->sk_prot, so we should make
    sure to not interfere with them.

    This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.

    syzbot reported :
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    PGD a0113067 P4D a0113067 PUD a8771067 PMD 0
    Oops: 0010 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:0x0
    Code: Bad RIP value.
    RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
    RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
    RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
    R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
    R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
    FS: 00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
    __sock_release net/socket.c:605 [inline]
    sock_close+0xe1/0x260 net/socket.c:1283
    __fput+0x2e4/0x740 fs/file_table.c:280
    ____fput+0x15/0x20 fs/file_table.c:313
    task_work_run+0x176/0x1b0 kernel/task_work.c:113
    tracehook_notify_resume include/linux/tracehook.h:188 [inline]
    exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
    prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
    syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
    do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x45c429
    Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f2ae75dac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: 0000000000000000 RBX: 00007f2ae75db6d4 RCX: 000000000045c429
    RDX: 0000000000000001 RSI: 000000000000011a RDI: 0000000000000004
    RBP: 000000000076bf20 R08: 0000000000000038 R09: 0000000000000000
    R10: 0000000020000180 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 0000000000000a9d R14: 00000000004ccfb4 R15: 000000000076bf2c
    Modules linked in:
    CR2: 0000000000000000
    ---[ end trace 82567b5207e87bae ]---
    RIP: 0010:0x0
    Code: Bad RIP value.
    RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
    RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
    RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
    R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
    R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
    FS: 00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com
    Cc: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Feb, 2020

2 commits

  • The Bareudp tunnel module provides a generic L3 encapsulation
    tunnelling module for tunnelling different protocols like MPLS,
    IP,NSH etc inside a UDP tunnel.

    Signed-off-by: Martin Varghese
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Martin Varghese
     
  • ip6mr_for_each_table() macro uses list_for_each_entry_rcu()
    for traversing outside an RCU read side critical section
    but under the protection of rtnl_mutex. Hence add the
    corresponding lockdep expression to silence the following
    false-positive warnings:

    [ 4.319479] =============================
    [ 4.319480] WARNING: suspicious RCU usage
    [ 4.319482] 5.5.4-stable #17 Tainted: G E
    [ 4.319483] -----------------------------
    [ 4.319485] net/ipv6/ip6mr.c:1243 RCU-list traversed in non-reader section!!

    [ 4.456831] =============================
    [ 4.456832] WARNING: suspicious RCU usage
    [ 4.456834] 5.5.4-stable #17 Tainted: G E
    [ 4.456835] -----------------------------
    [ 4.456837] net/ipv6/ip6mr.c:1582 RCU-list traversed in non-reader section!!

    Signed-off-by: Amol Grover
    Signed-off-by: David S. Miller

    Amol Grover
     

21 Feb, 2020

1 commit

  • Variables declared in a switch statement before any case statements
    cannot be automatically initialized with compiler instrumentation (as
    they are not part of any execution flow). With GCC's proposed automatic
    stack variable initialization feature, this triggers a warning (and they
    don't get initialized). Clang's automatic stack variable initialization
    (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
    doesn't initialize such variables[1]. Note that these warnings (or silent
    skipping) happen before the dead-store elimination optimization phase,
    so even when the automatic initializations are later elided in favor of
    direct initializations, the warnings remain.

    To avoid these problems, move such variables into the "case" where
    they're used or lift them up into the main function body.

    net/ipv6/ip6_gre.c: In function ‘ip6gre_err’:
    net/ipv6/ip6_gre.c:440:32: warning: statement will never be executed [-Wswitch-unreachable]
    440 | struct ipv6_tlv_tnl_enc_lim *tel;
    | ^~~

    net/ipv6/ip6_tunnel.c: In function ‘ip6_tnl_err’:
    net/ipv6/ip6_tunnel.c:520:32: warning: statement will never be executed [-Wswitch-unreachable]
    520 | struct ipv6_tlv_tnl_enc_lim *tel;
    | ^~~

    [1] https://bugs.llvm.org/show_bug.cgi?id=44916

    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

19 Feb, 2020

1 commit

  • The esp fill trailer method is identical for both
    IPv6 and IPv4.

    Share the implementation for esp6 and esp to avoid
    code duplication in addition it could be also used
    at various drivers code.

    Signed-off-by: Raed Salem
    Reviewed-by: Boris Pismenny
    Reviewed-by: Saeed Mahameed
    Signed-off-by: Steffen Klassert

    Raed Salem
     

17 Feb, 2020

2 commits

  • When splitting an RTA_MULTIPATH request into multiple routes and adding the
    second and later components, we must not simply remove NLM_F_REPLACE but
    instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink
    message was malformed.

    For example,
    ip route add 2001:db8::1/128 dev dummy0
    ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \
    nexthop via fe80::30:2 dev dummy0
    results in the following warnings:
    [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE
    [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route

    This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to
    what it would get if the multipath route had been added in multiple netlink
    operations:
    ip route add 2001:db8::1/128 dev dummy0
    ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0
    ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Benjamin Poirier
    Reviewed-by: Michal Kubecek
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Benjamin Poirier
     
  • After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no
    longer possible to replace an ECMP-able route by a non ECMP-able route.
    For example,
    ip route add 2001:db8::1/128 via fe80::1 dev dummy0
    ip route replace 2001:db8::1/128 dev dummy0
    does not work as expected.

    Tweak the replacement logic so that point 3 in the log of the above commit
    becomes:
    3. If the new route is not ECMP-able, and no matching non-ECMP-able route
    exists, replace matching ECMP-able route (if any) or add the new route.

    We can now summarize the entire replace semantics to:
    When doing a replace, prefer replacing a matching route of the same
    "ECMP-able-ness" as the replace argument. If there is no such candidate,
    fallback to the first route found.

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Benjamin Poirier
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Benjamin Poirier
     

14 Feb, 2020

2 commits

  • With ipip, it is possible to create an extra interface explicitly
    attached to a given physical interface:

    # ip link show tunl0
    4: tunl0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    # ip link add tunl1 type ipip dev eth0
    # ip link show tunl1
    6: tunl1@eth0: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

    But it is not possible with ip6tnl:

    # ip link show ip6tnl0
    5: ip6tnl0@NONE: mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd ::
    # ip link add ip6tnl1 type ip6tnl dev eth0
    RTNETLINK answers: File exists

    This patch aims to make it possible by adding link comparaison in both
    tunnel locate and lookup functions; we also modify mtu calculation when
    attached to an interface with a lower mtu.

    This permits to make use of x-netns communication by moving the newly
    created tunnel in a given netns.

    Signed-off-by: William Dauchy
    Reviewed-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    William Dauchy
     
  • This introduces a helper function to be called only by network drivers
    that wraps calls to icmp[v6]_send in a conntrack transformation, in case
    NAT has been used. We don't want to pollute the non-driver path, though,
    so we introduce this as a helper to be called by places that actually
    make use of this, as suggested by Florian.

    Signed-off-by: Jason A. Donenfeld
    Cc: Florian Westphal
    Signed-off-by: David S. Miller

    Jason A. Donenfeld
     

08 Feb, 2020

1 commit

  • __in6_dev_get(dev) called from inet6_set_link_af() can return NULL.

    The needed check has been recently removed, let's add it back.

    While do_setlink() does call validate_linkmsg() :
    ...
    err = validate_linkmsg(dev, tb); /* OK at this point */
    ...

    It is possible that the following call happening before the
    ->set_link_af() removes IPv6 if MTU is less than 1280 :

    if (tb[IFLA_MTU]) {
    err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack);
    if (err < 0)
    goto errout;
    status |= DO_SETLINK_MODIFIED;
    }
    ...

    if (tb[IFLA_AF_SPEC]) {
    ...
    err = af_ops->set_link_af(dev, af);
    ->inet6_set_link_af() // CRASH because idev is NULL

    Please note that IPv4 is immune to the bug since inet_set_link_af() does :

    struct in_device *in_dev = __in_dev_get_rcu(dev);
    if (!in_dev)
    return -EAFNOSUPPORT;

    This problem has been mentioned in commit cf7afbfeb8ce ("rtnl: make
    link af-specific updates atomic") changelog :

    This method is not fail proof, while it is currently sufficient
    to make set_link_af() inerrable and thus 100% atomic, the
    validation function method will not be able to detect all error
    scenarios in the future, there will likely always be errors
    depending on states which are f.e. not protected by rtnl_mutex
    and thus may change between validation and setting.

    IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready
    general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7]
    CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
    Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
    RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
    RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
    RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
    R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
    FS: 0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754
    rtnl_group_changelink net/core/rtnetlink.c:3103 [inline]
    __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257
    rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377
    rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:672
    ____sys_sendmsg+0x753/0x880 net/socket.c:2343
    ___sys_sendmsg+0x100/0x170 net/socket.c:2397
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
    __do_sys_sendmsg net/socket.c:2439 [inline]
    __se_sys_sendmsg net/socket.c:2437 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4402e9
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9
    RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8
    R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70
    R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000
    Modules linked in:
    ---[ end trace cfa7664b8fdcdff3 ]---
    RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
    Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
    RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
    RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
    RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
    R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
    FS: 0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

    Fixes: 7dc2bccab0ee ("Validate required parameters in inet6_validate_link_af")
    Signed-off-by: Eric Dumazet
    Bisected-and-reported-by: syzbot
    Cc: Maxim Mikityanskiy
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Feb, 2020

1 commit


30 Jan, 2020

2 commits

  • If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m:

    ERROR: "mptcp_handle_ipv6_mapped" [net/ipv6/ipv6.ko] undefined!

    This does not happen if CONFIG_MPTCP_IPV6=y, as CONFIG_MPTCP_IPV6
    selects CONFIG_IPV6, and thus forces CONFIG_IPV6 builtin.

    As exporting a symbol for an empty function would be a bit wasteful, fix
    this by providing a dummy version of mptcp_handle_ipv6_mapped() for the
    CONFIG_MPTCP_IPV6=n case.

    Rename mptcp_handle_ipv6_mapped() to mptcpv6_handle_mapped(), to make it
    clear this is a pure-IPV6 function, just like mptcpv6_init().

    Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     
  • We can't deal with syncookie mode yet, the syncookie rx path will create
    tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one:

    TCP: SYN flooding on port 20002. Sending cookies.
    BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
    Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120
    subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
    tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209
    cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252
    tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline]
    [..]

    Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2".

    Note that MPTCP should work with syncookies (4th ack would carry needed
    state), but it appears better to sort that out in -next so do tcp
    fallback for now.

    I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because
    if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair.

    Cc: Eric Dumazet
    Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections")
    Reported-by: Christoph Paasch
    Tested-by: Christoph Paasch
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal