18 Oct, 2018

3 commits

  • [ Upstream commit 86f9bd1ff61c413a2a251fa736463295e4e24733 ]

    The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
    handle starting/stopping the iteration. The problem is that at some point
    during the iteration, an overflow is detected and the process is
    subsequently stopped. The item being shown via seq_printf() when the
    overflow occurs is not actually shown, though. When start() is
    subsequently called to resume iterating, it returns the next item, and
    thus the item that was being processed when the overflow occurred never
    gets printed.

    Alter the meaning of the private data member "offset". Currently, when it
    is not 0 (which only happens at the very beginning), "offset" represents
    the next hlist item to be printed. After this change, "offset" always
    represents the current item.

    This is also consistent with the private data member "bucket", which
    represents the current bucket, and also the use of "pos" as defined in
    seq_file.txt:
    The pos passed to start() will always be either zero, or the most
    recent pos used in the previous session.

    Signed-off-by: Jeff Barnhill
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jeff Barnhill
     
  • [ Upstream commit a688caa34beb2fd2a92f1b6d33e40cde433ba160 ]

    In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
    directly assign the dst to skb and set passed in dst to NULL to avoid
    double free.
    However, in error case, we free skb and then do stats update with the
    dst pointer passed in. This causes use-after-free on the dst.
    Fix it by taking rcu read lock right before dst could get released to
    make sure dst does not get freed until the stats update is done.
    Note: we don't have this issue in ipv4 cause dst is not used for stats
    update in v4.

    Syzkaller reported following crash:
    BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
    BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
    Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088

    CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
    print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
    rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:631
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
    __sys_sendmsg+0x11d/0x280 net/socket.c:2152
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg net/socket.c:2159 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457099
    Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
    RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
    RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000

    Allocated by task 32088:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
    dst_alloc+0xbb/0x1d0 net/core/dst.c:105
    ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
    ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
    ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
    ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
    fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
    ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
    ip6_route_output include/net/ip6_route.h:88 [inline]
    ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
    ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
    rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:631
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
    __sys_sendmsg+0x11d/0x280 net/socket.c:2152
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg net/socket.c:2159 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 5356:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x83/0x290 mm/slab.c:3756
    dst_destroy+0x267/0x3c0 net/core/dst.c:141
    dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
    __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
    rcu_do_batch kernel/rcu/tree.c:2576 [inline]
    invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
    __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
    rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
    __do_softirq+0x30b/0xad8 kernel/softirq.c:292

    Fixes: 1789a640f556 ("raw: avoid two atomics in xmit")
    Signed-off-by: Wei Wang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Wang
     
  • [ Upstream commit 76c0ddd8c3a683f6e2c6e60e11dc1a1558caf4bc ]

    the ip6 tunnel xmit ndo assumes that the processed skb always
    contains an ip[v6] header, but syzbot has found a way to send
    frames that fall short of this assumption, leading to the following splat:

    BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
    [inline]
    BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
    net/ipv6/ip6_tunnel.c:1390
    CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
    ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
    ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
    __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
    netdev_start_xmit include/linux/netdevice.h:4075 [inline]
    xmit_one net/core/dev.c:3026 [inline]
    dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
    __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
    dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
    packet_snd net/packet/af_packet.c:2944 [inline]
    packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
    SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
    SyS_sendmmsg+0x63/0x90 net/socket.c:2162
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x441819
    RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
    RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
    RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
    R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
    kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
    slab_post_alloc_hook mm/slab.h:445 [inline]
    slab_alloc_node mm/slub.c:2737 [inline]
    __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:984 [inline]
    alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
    sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
    packet_alloc_skb net/packet/af_packet.c:2803 [inline]
    packet_snd net/packet/af_packet.c:2894 [inline]
    packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
    SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
    SyS_sendmmsg+0x63/0x90 net/socket.c:2162
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    This change addresses the issue adding the needed check before
    accessing the inner header.

    The ipv4 side of the issue is apparently there since the ipv4 over ipv6
    initial support, and the ipv6 side predates git history.

    Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
    Tested-by: Alexander Potapenko
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

29 Sep, 2018

3 commits

  • [ Upstream commit eb63f2964dbe36f26deac77d3016791675821ded ]

    Currently the UDPv6 early demux rx code path lacks some mandatory
    checks, already implemented into the normal RX code path - namely
    the checksum conversion and no_check6_rx check.

    Similar to the previous commit, we move the common processing to
    an UDPv6 specific helper and call it from both edemux code path
    and normal code path. In respect to the UDPv4, we need to add an
    explicit check for non zero csum according to no_check6_rx value.

    Reported-by: Jianlin Shi
    Suggested-by: Xin Long
    Fixes: c9f2c1ae123a ("udp6: fix socket leak on early demux")
    Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • [ Upstream commit bbd6528d28c1b8e80832b3b018ec402b6f5c3215 ]

    In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
    we need to call skb_set_owner_w() before consuming original skb,
    otherwise we risk a use-after-free.

    Bring IPv6 in line with what we do in IPv4 to fix this.

    Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit c56cae23c6b167acc68043c683c4573b80cbcc2c ]

    When splitting a GSO segment that consists of encapsulated packets, the
    skb->mac_len of the segments can end up being set wrong, causing packet
    drops in particular when using act_mirred and ifb interfaces in
    combination with a qdisc that splits GSO packets.

    This happens because at the time skb_segment() is called, network_header
    will point to the inner header, throwing off the calculation in
    skb_reset_mac_len(). The network_header is subsequently adjust by the
    outer IP gso_segment handlers, but they don't set the mac_len.

    Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
    gso_segment handlers, after they modify the network_header.

    Many thanks to Eric Dumazet for his help in identifying the cause of
    the bug.

    Acked-by: Dave Taht
    Reviewed-by: Eric Dumazet
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     

26 Sep, 2018

1 commit

  • commit f7225172f25aaf0dfd9ad65f05be8da5d6108b12 upstream.

    syzbot reported a use-after-free:

    BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
    Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555

    CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
    ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
    ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    Allocated by task 4555:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    dst_alloc+0xbb/0x1d0 net/core/dst.c:104
    __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
    ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
    ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
    ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    Freed by task 4555:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    dst_destroy+0x267/0x3c0 net/core/dst.c:140
    dst_release_immediate+0x71/0x9e net/core/dst.c:205
    fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
    __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
    ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    The problem is that rt_last can point to a deleted route if the insert
    fails.

    One reproducer is to insert a route and then add a multipath route that
    has a duplicate nexthop.e.g,:
    $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
    $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2

    Fix by not setting rt_last until the it is verified the insert succeeded.

    Backport Note:
    - Upstream has replaced rt6_info usage with fib6_info in 8d1c802b281
    ("net/ipv6: Flip FIB entries to fib6_info")
    - fib6_info_release was introduced upstream in 93531c674315
    ("net/ipv6: separate handling of FIB entries from dst based routes"),
    but is not present in stable kernels; 4.14.y relies on dst_release/
    ip6_rt_put/dst_release_immediate.

    Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
    Cc: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David Ahern
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Zubin Mithra
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

20 Sep, 2018

15 commits

  • commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream

    A kernel crash occurrs when defragmented packet is fragmented
    in ip_do_fragment().
    In defragment routine, skb_orphan() is called and
    skb->ip_defrag_offset is set. but skb->sk and
    skb->ip_defrag_offset are same union member. so that
    frag->sk is not NULL.
    Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
    defragmented packet is fragmented.

    test commands:
    %iptables -t nat -I POSTROUTING -j MASQUERADE
    %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

    splat looks like:
    [ 261.069429] kernel BUG at net/ipv4/ip_output.c:636!
    [ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
    [ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
    [ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
    [ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
    [ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
    [ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
    [ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
    [ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
    [ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
    [ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
    [ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
    [ 261.198158] Call Trace:
    [ 261.199018] ? dst_output+0x180/0x180
    [ 261.205011] ? save_trace+0x300/0x300
    [ 261.209018] ? ip_copy_metadata+0xb00/0xb00
    [ 261.213034] ? sched_clock_local+0xd4/0x140
    [ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack]
    [ 261.223014] ? rt_cpu_seq_stop+0x10/0x10
    [ 261.227014] ? find_held_lock+0x39/0x1c0
    [ 261.233008] ip_finish_output+0x51d/0xb50
    [ 261.237006] ? ip_fragment.constprop.56+0x220/0x220
    [ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
    [ 261.250152] ? rcu_is_watching+0x77/0x120
    [ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
    [ 261.261033] ? nf_hook_slow+0xb1/0x160
    [ 261.265007] ip_output+0x1c7/0x710
    [ 261.269005] ? ip_mc_output+0x13f0/0x13f0
    [ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0
    [ 261.278152] ? ip_fragment.constprop.56+0x220/0x220
    [ 261.282996] ? nf_hook_slow+0xb1/0x160
    [ 261.287007] raw_sendmsg+0x21f9/0x4420
    [ 261.291008] ? dst_output+0x180/0x180
    [ 261.297003] ? sched_clock_cpu+0x126/0x170
    [ 261.301003] ? find_held_lock+0x39/0x1c0
    [ 261.306155] ? stop_critical_timings+0x420/0x420
    [ 261.311004] ? check_flags.part.36+0x450/0x450
    [ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40
    [ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40
    [ 261.326142] ? cyc2ns_read_end+0x10/0x10
    [ 261.330139] ? raw_bind+0x280/0x280
    [ 261.334138] ? sched_clock_cpu+0x126/0x170
    [ 261.338995] ? check_flags.part.36+0x450/0x450
    [ 261.342991] ? __lock_acquire+0x4500/0x4500
    [ 261.348994] ? inet_sendmsg+0x11c/0x500
    [ 261.352989] ? dst_output+0x180/0x180
    [ 261.357012] inet_sendmsg+0x11c/0x500
    [ ... ]

    v2:
    - clear skb->sk at reassembly routine.(Eric Dumarzet)

    Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
    Suggested-by: Eric Dumazet
    Signed-off-by: Taehee Yoo
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit bffa72cf7f9df842f0016ba03586039296b4caaf upstream

    skb->rbnode shares space with skb->next, skb->prev and skb->tstamp

    Current uses (TCP receive ofo queue and netem) need to save/restore
    tstamp, while skb->dev is either NULL (TCP) or a constant for a given
    queue (netem).

    Since we plan using an RB tree for TCP retransmit queue to speedup SACK
    processing with large BDP, this patch exchanges skb->dev and
    skb->tstamp.

    This saves some overhead in both TCP and netem.

    v2: removes the swtstamp field from struct tcp_skb_cb

    Signed-off-by: Eric Dumazet
    Cc: Soheil Hassas Yeganeh
    Cc: Wei Wang
    Cc: Willem de Bruijn
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • don't bother with pathological cases, they only waste cycles.
    IPv6 requires a minimum MTU of 1280 so we should never see fragments
    smaller than this (except last frag).

    v3: don't use awkward "-offset + len"
    v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
    There were concerns that there could be even smaller frags
    generated by intermediate nodes, e.g. on radio networks.

    Cc: Peter Oskolkov
    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    (cherry picked from commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6)
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
    since linker might place next to it a non zero value preventing a change
    to ip6frag_low_thresh.

    ip6frag_low_thresh is not used anymore in the kernel, but we do not
    want to prematuraly break user scripts wanting to change it.

    Since specifying a minimal value of 0 for proc_doulongvec_minmax()
    is moot, let's remove these zero values in all defrag units.

    Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
    Signed-off-by: Eric Dumazet
    Reported-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller
    (cherry picked from commit 3d23401283e80ceb03f765842787e0e79ff598b7)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Make it similar to IPv4 ip_expire(), and release the lock
    before calling icmp functions.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 05c0b86b9696802fd0ce5676a92a63f1b455bdf3)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Some users are willing to provision huge amounts of memory to be able
    to perform reassembly reasonnably well under pressure.

    Current memory tracking is using one atomic_t and integers.

    Switch to atomic_long_t so that 64bit arches can use more than 2GB,
    without any cost for 32bit arches.

    Note that this patch avoids an overflow error, if high_thresh was set
    to ~2GB, since this test in inet_frag_alloc() was never true :

    if (... || frag_mem_limit(nf) > nf->high_thresh)

    Tested:

    $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh

    $ grep FRAG /proc/net/sockstat
    FRAG: inuse 14705885 memory 16000002880

    $ nstat -n ; sleep 1 ; nstat | grep Reas
    IpReasmReqds 3317150 0.0
    IpReasmFails 3317112 0.0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • This function is obsolete, after rhashtable addition to inet defrag.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • This refactors ip_expire() since one indentation level is removed.

    Note: in the future, we should try hard to avoid the skb_clone()
    since this is a serious performance cost.
    Under DDOS, the ICMP message wont be sent because of rate limits.

    Fact that ip6_expire_frag_queue() does not use skb_clone() is
    disturbing too. Presumably IPv6 should have the same
    issue than the one we fixed in commit ec4fbd64751d
    ("inet: frag: release spinlock before calling icmp_send()")

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 399d1404be660d355192ff4df5ccc3f4159ec1e4)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()

    Also since we use rhashtable we can bring back the number of fragments
    in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
    removed in commit 434d305405ab ("inet: frag: don't account number
    of fragment queues")

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Some applications still rely on IP fragmentation, and to be fair linux
    reassembly unit is not working under any serious load.

    It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

    A work queue is supposed to garbage collect items when host is under memory
    pressure, and doing a hash rebuild, changing seed used in hash computations.

    This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
    occurring every 5 seconds if host is under fire.

    Then there is the problem of sharing this hash table for all netns.

    It is time to switch to rhashtables, and allocate one of them per netns
    to speedup netns dismantle, since this is a critical metric these days.

    Lookup is now using RCU. A followup patch will even remove
    the refcount hold/release left from prior implementation and save
    a couple of atomic operations.

    Before this patch, 16 cpus (16 RX queue NIC) could not handle more
    than 1 Mpps frags DDOS.

    After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
    of storage for the fragments (exact number depends on frags being evicted
    after timeout)

    $ grep FRAG /proc/net/sockstat
    FRAG: inuse 1966916 memory 2140004608

    A followup patch will change the limits for 64bit arches.

    Signed-off-by: Eric Dumazet
    Cc: Kirill Tkhai
    Cc: Herbert Xu
    Cc: Florian Westphal
    Cc: Jesper Dangaard Brouer
    Cc: Alexander Aring
    Cc: Stefan Schmidt
    Signed-off-by: David S. Miller
    (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment
    sysctl to unprivileged users")

    The only sysctl that is not per-netns is not used :
    ip6frag_secret_interval

    Signed-off-by: Eric Dumazet
    Cc: Nikolay Borisov
    Signed-off-by: David S. Miller
    (cherry picked from commit 18dcbe12fe9fca0ab825f7eff993060525ac2503)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • We want to call inet_frags_init() earlier.

    This is a prereq to "inet: frags: use rhashtables for reassembly units"

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 5b975bab23615cd0fdf67af6c9298eb01c4b9f61)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Alexander Aring
    Cc: Stefan Schmidt
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: Hideaki YOSHIFUJI
    Cc: Pablo Neira Ayuso
    Cc: Jozsef Kadlecsik
    Cc: Florian Westphal
    Cc: linux-wpan@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: netfilter-devel@vger.kernel.org
    Cc: coreteam@netfilter.org
    Signed-off-by: Kees Cook
    Acked-by: Stefan Schmidt # for ieee802154
    Signed-off-by: David S. Miller
    (cherry picked from commit 78802011fbe34331bdef6f2dfb1634011f0e4c32)
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • In order to simplify the API, add a pointer to struct inet_frags.
    This will allow us to make things less complex.

    These functions no longer have a struct inet_frags parameter :

    inet_frag_destroy(struct inet_frag_queue *q /*, struct inet_frags *f */)
    inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
    inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
    inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
    ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 093ba72914b696521e4885756a68a3332782c8de)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • We will soon initialize one rhashtable per struct netns_frags
    in inet_frags_init_net().

    This patch changes the return value to eventually propagate an
    error.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 787bea7748a76130566f881c2342a0be4127d182)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

15 Sep, 2018

2 commits

  • [ Upstream commit da786717e0894886301ed2536843c13f9e8fd53e ]

    Roman reports that DHCPv6 client no longer sees replies from server
    due to

    ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP

    rule. We need to set the F_IFACE flag for linklocal addresses, they
    are scoped per-device.

    Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
    Reported-by: Roman Mamedov
    Tested-by: Roman Mamedov
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 9f2895461439fda2801a7906fb4c5fb3dbb37a0a ]

    Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting
    on xmit") '!skb->ignore_df' check was always true because the function
    skb_scrub_packet() was called before it, resetting ignore_df to zero.

    In the commit, skb_scrub_packet() was moved below, and now this check
    can be false for the packet, e.g. when sending it in the two fragments,
    this prevents successful PMTU updates in such case. The next attempts
    to send the packet lead to the same tx error. Moreover, vti6 initial
    MTU value relies on PMTU adjustments.

    This issue can be reproduced with the following LTP test script:
    udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000

    Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.")
    Signed-off-by: Alexey Kodanev
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     

05 Sep, 2018

2 commits

  • [ Upstream commit 7284fdf39a912322ce97de2d30def3c6068a418c ]

    This ought to be an omission in e6194923237 ("esp: Fix memleaks on error
    paths."). The memleak on error path in esp6_input is similar to esp_input
    of esp4.

    Fixes: e6194923237 ("esp: Fix memleaks on error paths.")
    Fixes: 3f29770723f ("ipsec: check return value of skb_to_sgvec always")
    Signed-off-by: Zhen Lei
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Zhen Lei
     
  • [ Upstream commit d6990976af7c5d8f55903bfb4289b6fb030bf754 ]

    When setting the skb->dst before doing the MTU check, the route PMTU
    caching and reporting is done on the new dst which is about to be
    released.

    Instead, PMTU handling should be done using the original dst.

    This is aligned with IPv4 VTI.

    Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.")
    Signed-off-by: Eyal Birger
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eyal Birger
     

24 Aug, 2018

4 commits

  • [ Upstream commit a9ba23d48dbc6ffd08426bb10f05720e0b9f5c14 ]

    At present the ipv6_renew_options_kern() function ends up calling into
    access_ok() which is problematic if done from inside an interrupt as
    access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
    (x86-64 is affected). Example warning/backtrace is shown below:

    WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
    ...
    Call Trace:

    ipv6_renew_option+0xb2/0xf0
    ipv6_renew_options+0x26a/0x340
    ipv6_renew_options_kern+0x2c/0x40
    calipso_req_setattr+0x72/0xe0
    netlbl_req_setattr+0x126/0x1b0
    selinux_netlbl_inet_conn_request+0x80/0x100
    selinux_inet_conn_request+0x6d/0xb0
    security_inet_conn_request+0x32/0x50
    tcp_conn_request+0x35f/0xe00
    ? __lock_acquire+0x250/0x16c0
    ? selinux_socket_sock_rcv_skb+0x1ae/0x210
    ? tcp_rcv_state_process+0x289/0x106b
    tcp_rcv_state_process+0x289/0x106b
    ? tcp_v6_do_rcv+0x1a7/0x3c0
    tcp_v6_do_rcv+0x1a7/0x3c0
    tcp_v6_rcv+0xc82/0xcf0
    ip6_input_finish+0x10d/0x690
    ip6_input+0x45/0x1e0
    ? ip6_rcv_finish+0x1d0/0x1d0
    ipv6_rcv+0x32b/0x880
    ? ip6_make_skb+0x1e0/0x1e0
    __netif_receive_skb_core+0x6f2/0xdf0
    ? process_backlog+0x85/0x250
    ? process_backlog+0x85/0x250
    ? process_backlog+0xec/0x250
    process_backlog+0xec/0x250
    net_rx_action+0x153/0x480
    __do_softirq+0xd9/0x4f7
    do_softirq_own_stack+0x2a/0x40

    ...

    While not present in the backtrace, ipv6_renew_option() ends up calling
    access_ok() via the following chain:

    access_ok()
    _copy_from_user()
    copy_from_user()
    ipv6_renew_option()

    The fix presented in this patch is to perform the userspace copy
    earlier in the call chain such that it is only called when the option
    data is actually coming from userspace; that place is
    do_ipv6_setsockopt(). Not only does this solve the problem seen in
    the backtrace above, it also allows us to simplify the code quite a
    bit by removing ipv6_renew_options_kern() completely. We also take
    this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
    a small amount as well.

    This patch is heavily based on a rough patch by Al Viro. I've taken
    his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
    to a memdup_user() call, made better use of the e_inval jump target in
    the same function, and cleaned up the use ipv6_renew_option() by
    ipv6_renew_options().

    CC: Al Viro
    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     
  • [ Upstream commit d376bef9c29b3c65aeee4e785fffcd97ef0a9a81 ]

    nft_compat relies on xt_request_find_match to increment
    refcount of the module that provides the match/target.

    The (builtin) icmp matches did't set the module owner so it
    was possible to rmmod ip(6)tables while icmp extensions were still in use.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 6c6da92808442908287fae8ebb0ca041a52469f4 ]

    After recieving MLD querys, we update idev->mc_maxdelay with max_delay
    from query header. This make the later unsolicited reports have the same
    interval with mc_maxdelay, which means we may send unsolicited reports with
    long interval time instead of default configured interval time.

    Also as we will not call ipv6_mc_reset() after device up. This issue will
    be there even after leave the group and join other groups.

    Fixes: fc4eba58b4c14 ("ipv6: make unsolicited report intervals configurable for mld")
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit 9ce7bc036ae4cfe3393232c86e9e1fea2153c237 ]

    It is a waste of memory to use a full "struct netns_sysctl_ipv6"
    while only one pointer is really used, considering netns_sysctl_ipv6
    keeps growing.

    Also, since "struct netns_frags" has cache line alignment,
    it is better to move the frags_hdr pointer outside, otherwise
    we spend a full cache line for this pointer.

    This saves 192 bytes of memory per netns.

    Fixes: c038a767cd69 ("ipv6: add a new namespace for nf_conntrack_reasm")
    Signed-off-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

22 Aug, 2018

1 commit

  • [ Upstream commit 82a40777de12728dedf4075453b694f0d1baee80 ]

    According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
    device must be able to forward without further fragmentation while 576
    bytes is the minimum size of IPv4 datagram every device has to be able
    to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
    value for the ipv4 min mtu check in ip6_tnl_xmit.

    While at it, change to use max() instead of if statement.

    Fixes: c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
    Reported-by: Sabrina Dubroca
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     

28 Jul, 2018

4 commits

  • There are two scenarios that we will restore deleted records. The first is
    when device down and up(or unmap/remap). In this scenario the new filter
    mode is same with previous one. Because we get it from in_dev->mc_list and
    we do not touch it during device down and up.

    The other scenario is when a new socket join a group which was just delete
    and not finish sending status reports. In this scenario, we should use the
    current filter mode instead of restore old one. Here are 4 cases in total.

    old_socket new_socket before_fix after_fix
    IN(A) IN(A) ALLOW(A) ALLOW(A)
    IN(A) EX( ) TO_IN( ) TO_EX( )
    EX( ) IN(A) TO_EX( ) ALLOW(A)
    EX( ) EX( ) TO_EX( ) TO_EX( )

    Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down)
    Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down)
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit 24b711edfc34bc45777a3f068812b7d1ed004a5d ]

    Example setup:
    host: ip -6 addr add dev eth1 2001:db8:104::4
    where eth1 is enslaved to a VRF

    switch: ip -6 ro add 2001:db8:104::4/128 dev br1
    where br1 only has an LLA

    ping6 2001:db8:104::4
    ssh 2001:db8:104::4

    (NOTE: UDP works fine if the PKTINFO has the address set to the global
    address and ifindex is set to the index of eth1 with a destination an
    LLA).

    For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
    L3 master. If it is then return the ifindex from rt6i_idev similar
    to what is done for loopback.

    For TCP, restore the original tcp_v6_iif definition which is needed in
    most places and add a new tcp_v6_iif_l3_slave that considers the
    l3_slave variability. This latter check is only needed for socket
    lookups.

    Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 2efd4fca703a6707cad16ab486eaab8fc7f0fd49 ]

    Syzbot reported a read beyond the end of the skb head when returning
    IPV6_ORIGDSTADDR:

    BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
    CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x5ef/0x860 net/core/scm.c:242
    ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
    ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
    rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
    [..]

    This logic and its ipv4 counterpart read the destination port from
    the packet at skb_transport_offset(skb) + 4.

    With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
    packet that stores headers exactly up to skb_transport_offset(skb) in
    the head and the remainder in a frag.

    Call pskb_may_pull before accessing the pointer to ensure that it lies
    in skb head.

    Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
    Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 3dd1c9a1270736029ffca670e9bd0265f4120600 ]

    The skb hash for locally generated ip[v6] fragments belonging
    to the same datagram can vary in several circumstances:
    * for connected UDP[v6] sockets, the first fragment get its hash
    via set_owner_w()/skb_set_hash_from_sk()
    * for unconnected IPv6 UDPv6 sockets, the first fragment can get
    its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
    auto_flowlabel is enabled

    For the following frags the hash is usually computed via
    skb_get_hash().
    The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
    scenario the egress tx queue can be selected on a per packet basis
    via the skb hash.
    It may also fool flow-oriented schedulers to place fragments belonging
    to the same datagram in different flows.

    Fix the issue by copying the skb hash from the head frag into
    the others at fragmentation time.

    Before this commit:
    perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
    netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
    perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
    perf script
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0

    After this commit:
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
    probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0

    Fixes: b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
    Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
    Signed-off-by: Paolo Abeni
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

25 Jul, 2018

2 commits

  • [ Upstream commit e66515999b627368892ccc9b3a13a506f2ea1357 ]

    Commit adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
    added enhanced DAD with a nonce length of 6 bytes. However, RFC7527
    doesn't specify the length of the nonce, other than being 6 + 8*k bytes,
    with integer k >= 0 (RFC3971 5.3.2). The current implementation simply
    assumes that the nonce will always be 6 bytes, but others systems are
    free to choose different sizes.

    If another system sends a nonce of different length but with the same 6
    bytes prefix, it shouldn't be considered as the same nonce. Thus, check
    that the length of the received nonce is the same as the length we sent.

    Ugly scapy test script running on veth0:

    def loop():
    pkt=sniff(iface="veth0", filter="icmp6", count=1)
    pkt = pkt[0]
    b = bytearray(pkt[Raw].load)
    b[1] += 1
    b += b'\xde\xad\xbe\xef\xde\xad\xbe\xef'
    pkt[Raw].load = bytes(b)
    pkt[IPv6].plen += 8
    # fixup checksum after modifying the payload
    pkt[IPv6].payload.cksum -= 0x3b44
    if pkt[IPv6].payload.cksum < 0:
    pkt[IPv6].payload.cksum += 0xffff
    sendp(pkt, iface="veth0")

    This should result in DAD failure for any address added to veth0's peer,
    but is currently ignored.

    Fixes: adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit 83ed7d1fe2d2d4a11b30660dec20168bb473d9c1 ]

    My randconfig builds came across an old missing dependency for ILA:

    ERROR: "dst_cache_set_ip6" [net/ipv6/ila/ila.ko] undefined!
    ERROR: "dst_cache_get" [net/ipv6/ila/ila.ko] undefined!
    ERROR: "dst_cache_init" [net/ipv6/ila/ila.ko] undefined!
    ERROR: "dst_cache_destroy" [net/ipv6/ila/ila.ko] undefined!

    We almost never run into this by accident because randconfig builds
    end up selecting DST_CACHE from some other tunnel protocol, and this
    one appears to be the only one missing the explicit 'select'.

    >From all I can tell, this problem first appeared in linux-4.9
    when dst_cache support got added to ILA.

    Fixes: 79ff2fc31e0f ("ila: Cache a route to translated address")
    Cc: Tom Herbert
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     

22 Jul, 2018

3 commits

  • commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream.

    Eric Dumazet reports:
    Here is a reproducer of an annoying bug detected by syzkaller on our production kernel
    [..]
    ./b78305423 enable_conntrack
    Then :
    sleep 60
    dmesg | tail -10
    [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2

    Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook.
    skb gets queued until frag timer expiry -- 1 minute.

    Normally nf_conntrack_reasm gets called during prerouting, so skb has
    no dst yet which might explain why this wasn't spotted earlier.

    Reported-by: Eric Dumazet
    Reported-by: John Sperbeck
    Signed-off-by: Florian Westphal
    Tested-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ]

    Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
    need to fail if dev_match is not true. Currently, a packet to a given port
    can match a socket bound to device when it should not. In the VRF case,
    this causes the lookup to hit a VRF socket and not a global socket
    resulting in a response trying to go through the VRF when it should not.

    Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
    Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
    Reported-by: Lou Berger
    Diagnosed-by: Renato Westphal
    Tested-by: Renato Westphal
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit fc9c2029e37c3ae9efc28bf47045e0b87e09660c ]

    The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
    not 'gfp_t'. So don't pass GFP_KERNEL to it.

    Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
    Signed-off-by: Eric Biggers
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers