07 Feb, 2019

2 commits

  • [ Upstream commit ef489749aae508e6f17886775c075f12ff919fb1 ]

    skb->cb may contain data from previous layers (in an observed case
    IPv4 with L3 Master Device). In the observed scenario, the data in
    IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
    eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
    through ip6_finish_output().

    This patch clears IP6CB(skb), which potentially contains garbage data,
    on the SRH ip4ip6 encapsulation.

    Fixes: 32d99d0b6702 ("ipv6: sr: add support for ip4ip6 encapsulation")
    Signed-off-by: Yohei Kanemaru
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yohei Kanemaru
     
  • [ Upstream commit c5ee066333ebc322a24a00a743ed941a0c68617e ]

    IPv6 does not consider if the socket is bound to a device when binding
    to an address. The result is that a socket can be bound to eth0 and then
    bound to the address of eth1. If the device is a VRF, the result is that
    a socket can only be bound to an address in the default VRF.

    Resolve by considering the device if sk_bound_dev_if is set.

    This problem exists from the beginning of git history.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

26 Jan, 2019

2 commits

  • [ Upstream commit d4a7e9bb74b5aaf07b89f6531c080b1130bdf019 ]

    I realized the last patch calls dev_get_by_index_rcu in a branch not
    holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock.

    Fixes: ec90ad334986 ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit ec90ad334986fa5856d11dd272f7f22fa86c55c4 ]

    Similar to c5ee066333eb ("ipv6: Consider sk_bound_dev_if when binding a
    socket to an address"), binding a socket to v4 mapped addresses needs to
    consider if the socket is bound to a device.

    This problem also exists from the beginning of git history.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

23 Jan, 2019

2 commits

  • [ Upstream commit 4a06fa67c4da20148803525151845276cdb995c1 ]

    Commit 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call
    pskb_may_pull") avoided a read beyond the end of the skb linear
    segment by calling pskb_may_pull.

    That function can trigger a BUG_ON in pskb_expand_head if the skb is
    shared, which it is when when peeking. It can also return ENOMEM.

    Avoid both by switching to safer skb_header_pointer.

    Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
    Reported-by: syzbot
    Suggested-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 7d033c9f6a7fd3821af75620a0257db87c2b552a ]

    This patch makes sure the flow label in the IPv6 header
    forged in ipv6_local_error() is initialized.

    BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
    CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
    kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675
    kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
    _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
    copy_to_user include/linux/uaccess.h:177 [inline]
    move_addr_to_user+0x2e9/0x4f0 net/socket.c:227
    ___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284
    __sys_recvmsg net/socket.c:2327 [inline]
    __do_sys_recvmsg net/socket.c:2337 [inline]
    __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
    __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x457ec9
    Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
    RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4
    R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
    kmsan_save_stack mm/kmsan/kmsan.c:219 [inline]
    kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439
    __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
    ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475
    udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335
    inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830
    sock_recvmsg_nosec net/socket.c:794 [inline]
    sock_recvmsg+0x1d1/0x230 net/socket.c:801
    ___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278
    __sys_recvmsg net/socket.c:2327 [inline]
    __do_sys_recvmsg net/socket.c:2337 [inline]
    __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
    __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
    kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
    kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
    kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2759 [inline]
    __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
    __kmalloc_reserve net/core/skbuff.c:137 [inline]
    __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
    alloc_skb include/linux/skbuff.h:998 [inline]
    ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334
    __ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311
    ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775
    udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384
    inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    __sys_sendto+0x8c4/0xac0 net/socket.c:1788
    __do_sys_sendto net/socket.c:1800 [inline]
    __se_sys_sendto+0x107/0x130 net/socket.c:1796
    __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Bytes 4-7 of 28 are uninitialized
    Memory access of size 28 starts at ffff8881937bfce0
    Data copied to user address 0000000020000000

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

10 Jan, 2019

3 commits

  • [ Upstream commit cbb49697d5512ce9e61b45ce75d3ee43d7ea5524 ]

    xfrm6_policy_check() might have re-allocated skb->head, we need
    to reload ipv6 header pointer.

    sysbot reported :

    BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
    Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

    CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x244/0x39d lib/dump_stack.c:113
    print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
    __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
    __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
    ipv6_addr_type include/net/ipv6.h:403 [inline]
    ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
    ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
    vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
    xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
    ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
    ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
    IPVS: ftp: loaded support on port[0] = 21
    ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
    dst_input include/net/dst.h:450 [inline]
    ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
    __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
    process_backlog+0x24e/0x7a0 net/core/dev.c:5923
    napi_poll net/core/dev.c:6346 [inline]
    net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
    __do_softirq+0x308/0xb7e kernel/softirq.c:292
    do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027

    do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
    do_softirq+0x19/0x20 kernel/softirq.c:340
    netif_rx_ni+0x521/0x860 net/core/dev.c:4569
    dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
    ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
    ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:278 [inline]
    ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
    dst_output include/net/dst.h:444 [inline]
    ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
    ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
    ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
    rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
    rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
    kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: ''
    kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:631
    sock_write_iter+0x35e/0x5c0 net/socket.c:900
    call_write_iter include/linux/fs.h:1857 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
    kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
    kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
    vfs_write+0x1fc/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457669
    Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
    RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
    kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
    RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
    R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

    Allocated by task 1304:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
    __do_kmalloc_node mm/slab.c:3684 [inline]
    __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
    __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
    __alloc_skb+0x155/0x760 net/core/skbuff.c:208
    kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
    alloc_skb include/linux/skbuff.h:1011 [inline]
    __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
    ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
    rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:631
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
    __sys_sendmsg+0x11d/0x280 net/socket.c:2154
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg net/socket.c:2161 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

    Freed by task 1304:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kfree+0xcf/0x230 mm/slab.c:3817
    skb_free_head+0x93/0xb0 net/core/skbuff.c:553
    pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
    __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
    pskb_may_pull include/linux/skbuff.h:2188 [inline]
    _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
    __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
    kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
    __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
    __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
    xfrm_policy_check include/net/xfrm.h:1175 [inline]
    xfrm6_policy_check include/net/xfrm.h:1185 [inline]
    vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
    xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
    ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
    ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
    ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
    dst_input include/net/dst.h:450 [inline]
    ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
    __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
    process_backlog+0x24e/0x7a0 net/core/dev.c:5923
    kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
    napi_poll net/core/dev.c:6346 [inline]
    net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
    __do_softirq+0x308/0xb7e kernel/softirq.c:292

    The buggy address belongs to the object at ffff888191b8cac0
    which belongs to the cache kmalloc-512 of size 512
    The buggy address is located 176 bytes inside of
    512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
    The buggy address belongs to the page:
    page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
    flags: 0x2fffc0000000200(slab)
    raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
    raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
    page dumped because: kasan: bad access detected
    kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: ''

    Memory state around the buggy address:
    ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
    >ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

    Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path")
    Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces")
    Signed-off-by: Eric Dumazet
    Cc: Steffen Klassert
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit fb24274546310872eeeaf3d1d53799d8414aa0f2 ]

    syzbot reported the use of uninitialized udp6_addr::sin6_scope_id.
    We can just set ::sin6_scope_id to zero, as tunnels are unlikely
    to use an IPv6 address that needs a scope id and there is no
    interface to bind in this context.

    For net-next, it looks different as we have cfg->bind_ifindex there
    so we can probably call ipv6_iface_scope_id().

    Same for ::sin6_flowinfo, tunnels don't use it.

    Fixes: 8024e02879dd ("udp: Add udp_sock_create for UDP tunnels to open listener socket")
    Reported-by: syzbot+c56449ed3652e6720f30@syzkaller.appspotmail.com
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 69d2c86766da2ded2b70281f1bf242cb0d58a778 ]

    vr.mifi is indirectly controlled by user-space, hence leading to
    a potential exploitation of the Spectre variant 1 vulnerability.

    This issue was detected with the help of Smatch:

    net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
    net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)

    Fix this by sanitizing vr.mifi before using it to index mrt->vif_table'

    Notice that given that speculation windows are large, the policy is
    to kill the speculation on the first load and not worry if it can be
    completed with a dependent load/store [1].

    [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     

17 Dec, 2018

4 commits

  • [ Upstream commit 508b09046c0f21678652fb66fd1e9959d55591d2 ]

    When ip6_route_me_harder is invoked, it resets outgoing interface of:
    - link-local scoped packets sent by neighbor discovery
    - multicast packets sent by MLD host
    - multicast packets send by MLD proxy daemon that sets outgoing
    interface through IPV6_PKTINFO ipi6_ifindex

    Link-local and multicast packets must keep their original oif after
    ip6_route_me_harder is called.

    Signed-off-by: Alin Nastac
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Alin Nastac
     
  • [ Upstream commit 1b4e5ad5d6b9f15cd0b5121f86d4719165958417 ]

    In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
    initialization.

    Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
    Signed-off-by: Shmulik Ladkani
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Shmulik Ladkani
     
  • [ Upstream commit 66033f47ca60294a95fc85ec3a3cc909dab7b765 ]

    Even if we send an IPv6 packet without options, MAX_HEADER might not be
    enough to account for the additional headroom required by alignment of
    hardware headers.

    On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
    sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
    100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
    bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().

    Those would be enough to append our 14 bytes header, but we're going to
    align that to 16 bytes, and write 2 bytes out of the allocated slab in
    neigh_hh_output().

    KASan says:

    [ 264.967848] ==================================================================
    [ 264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
    [ 264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
    [ 264.967870]
    [ 264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
    [ 264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
    [ 264.967887] Call Trace:
    [ 264.967896] ([] show_stack+0x56/0xa0)
    [ 264.967903] [] dump_stack+0x23c/0x290
    [ 264.967912] [] print_address_description+0xf4/0x290
    [ 264.967919] [] kasan_report+0x13c/0x240
    [ 264.967927] [] ip6_finish_output2+0x1aec/0x1c70
    [ 264.967935] [] ip6_finish_output+0x430/0x7f0
    [ 264.967943] [] ip6_output+0x1f4/0x580
    [ 264.967953] [] ip6_xmit+0xfea/0x1ce8
    [ 264.967963] [] inet6_csk_xmit+0x282/0x3f8
    [ 264.968033] [] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
    [ 264.968037] [] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
    [ 264.968041] [] dev_hard_start_xmit+0x268/0x928
    [ 264.968069] [] sch_direct_xmit+0x7ae/0x1350
    [ 264.968071] [] __dev_queue_xmit+0x2b7c/0x3478
    [ 264.968075] [] ip_finish_output2+0xce2/0x11a0
    [ 264.968078] [] ip_finish_output+0x56c/0x8c8
    [ 264.968081] [] ip_output+0x226/0x4c0
    [ 264.968083] [] __ip_queue_xmit+0x894/0x1938
    [ 264.968100] [] sctp_packet_transmit+0x29d4/0x3648 [sctp]
    [ 264.968116] [] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
    [ 264.968131] [] sctp_outq_flush+0x22e/0x7d8 [sctp]
    [ 264.968146] [] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
    [ 264.968161] [] sctp_do_sm+0x222/0x648 [sctp]
    [ 264.968177] [] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
    [ 264.968192] [] __sctp_connect+0x830/0xc20 [sctp]
    [ 264.968208] [] sctp_inet_connect+0x2e6/0x378 [sctp]
    [ 264.968212] [] __sys_connect+0x21a/0x450
    [ 264.968215] [] sys_socketcall+0x3d0/0xb08
    [ 264.968218] [] system_call+0x2a2/0x2c0

    [...]

    Just like ip_finish_output2() does for IPv4, check that we have enough
    headroom in ip6_xmit(), and reallocate it if we don't.

    This issue is older than git history.

    Reported-by: Jianlin Shi
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Brivio
     
  • [ Upstream commit ebaf39e6032faf77218220707fc3fa22487784e0 ]

    The *_frag_reasm() functions are susceptible to miscalculating the byte
    count of packet fragments in case the truesize of a head buffer changes.
    The truesize member may be changed by the call to skb_unclone(), leaving
    the fragment memory limit counter unbalanced even if all fragments are
    processed. This miscalculation goes unnoticed as long as the network
    namespace which holds the counter is not destroyed.

    Should an attempt be made to destroy a network namespace that holds an
    unbalanced fragment memory limit counter the cleanup of the namespace
    never finishes. The thread handling the cleanup gets stuck in
    inet_frags_exit_net() waiting for the percpu counter to reach zero. The
    thread is usually in running state with a stacktrace similar to:

    PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4"
    #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
    #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
    #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
    #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
    #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
    #10 [ffff880621563e38] process_one_work at ffffffff81096f14

    It is not possible to create new network namespaces, and processes
    that call unshare() end up being stuck in uninterruptible sleep state
    waiting to acquire the net_mutex.

    The bug was observed in the IPv6 netfilter code by Per Sundstrom.
    I thank him for his analysis of the problem. The parts of this patch
    that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

    Signed-off-by: Jiri Wiesner
    Reported-by: Per Sundstrom
    Acked-by: Peter Oskolkov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jiri Wiesner
     

23 Nov, 2018

1 commit

  • [ Upstream commit 7ddacfa564870cdd97275fd87decb6174abc6380 ]

    Preethi reported that PMTU discovery for UDP/raw applications is not
    working in the presence of VRF when the socket is not bound to a device.
    The problem is that ip6_sk_update_pmtu does not consider the L3 domain
    of the skb device if the socket is not bound. Update the function to
    set oif to the L3 master device if relevant.

    Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack")
    Reported-by: Preethi Ramachandra
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

04 Nov, 2018

9 commits

  • [ Upstream commit d4d576f5ab7edcb757bb33e6a5600666a0b1232d ]

    Commit 058214a4d1df ("ip6_tun: Add infrastructure for doing
    encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
    the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
    Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.

    As long as the option didn't actually end up in generated packets, this
    wasn't an issue. Then commit 89a23c8b528b ("ip6_tunnel: Fix missing tunnel
    encapsulation limit option") fixed sending of this option, and the
    resulting layout, e.g. for FoU, is:

    .-------------------.------------.----------.-------------------.----- - -
    | Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
    '-------------------'------------'----------'-------------------'----- - -

    Needless to say, FoU and GUE (at least) won't work over IPv6. The option
    is appended by default, and I couldn't find a way to disable it with the
    current iproute2.

    Turn this into a more reasonable:

    .-------------------.----------.------------.-------------------.----- - -
    | Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
    '-------------------'----------'------------'-------------------'----- - -

    With this, and with 84dad55951b0 ("udp6: fix encap return code for
    resubmitting"), FoU and GUE work again over IPv6.

    Fixes: 058214a4d1df ("ip6_tun: Add infrastructure for doing encapsulation")
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Brivio
     
  • [ Upstream commit 84dad55951b0d009372ec21760b650634246e144 ]

    The commit eb63f2964dbe ("udp6: add missing checks on edumux packet
    processing") used the same return code convention of the ipv4 counterpart,
    but ipv6 uses the opposite one: positive values means resubmit.

    This change addresses the issue, using positive return value for
    resubmitting. Also update the related comment, which was broken, too.

    Fixes: eb63f2964dbe ("udp6: add missing checks on edumux packet processing")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • [ Upstream commit db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 ]

    Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
    incorrect for any packet that has an incorrect checksum value.

    udp4/6_csum_init() will both make a call to
    __skb_checksum_validate_complete() to initialize/validate the csum
    field when receiving a CHECKSUM_COMPLETE packet. When this packet
    fails validation, skb->csum will be overwritten with the pseudoheader
    checksum so the packet can be fully validated by software, but the
    skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
    the stack can later warn the user about their hardware spewing bad
    checksums. Unfortunately, leaving the SKB in this state can cause
    problems later on in the checksum calculation.

    Since the the packet is still marked as CHECKSUM_COMPLETE,
    udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
    from skb->csum instead of adding it, leaving us with a garbage value
    in that field. Once we try to copy the packet to userspace in the
    udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
    to checksum the packet data and add it in the garbage skb->csum value
    to perform our final validation check.

    Since the value we're validating is not the proper checksum, it's possible
    that the folded value could come out to 0, causing us not to drop the
    packet. Instead, we believe that the packet was checksummed incorrectly
    by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
    to warn the user with netdev_rx_csum_fault(skb->dev);

    Unfortunately, since this is the UDP path, skb->dev has been overwritten
    by skb->dev_scratch and is no longer a valid pointer, so we end up
    reading invalid memory.

    This patch addresses this problem in two ways:
    1) Do not use the dev pointer when calling netdev_rx_csum_fault()
    from skb_copy_and_csum_datagram_msg(). Since this gets called
    from the UDP path where skb->dev has been overwritten, we have
    no way of knowing if the pointer is still valid. Also for the
    sake of consistency with the other uses of
    netdev_rx_csum_fault(), don't attempt to call it if the
    packet was checksummed by software.

    2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
    If we receive a packet that's CHECKSUM_COMPLETE that fails
    verification (i.e. skb->csum_valid == 0), check who performed
    the calculation. It's possible that the checksum was done in
    software by the network stack earlier (such as Netfilter's
    CONNTRACK module), and if that says the checksum is bad,
    we can drop the packet immediately instead of waiting until
    we try and copy it to userspace. Otherwise, we need to
    mark the SKB as CHECKSUM_NONE, since the skb->csum field
    no longer contains the full packet checksum after the
    call to __skb_checksum_validate_complete().

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Fixes: c84d949057ca ("udp: copy skb->truesize in the first cache line")
    Cc: Sam Kumar
    Cc: Eric Dumazet
    Signed-off-by: Sean Tranchetti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sean Tranchetti
     
  • [ Upstream commit 4ba4c566ba8448a05e6257e0b98a21f1a0d55315 ]

    The loop wants to skip previously dumped addresses, so loops until
    current index >= saved index. If the message fills it wants to save
    the index for the next address to dump - ie., the one that did not
    fit in the current message.

    Currently, it is incrementing the index counter before comparing to the
    saved index, and then the saved index is off by 1 - it assumes the
    current address is going to fit in the message.

    Change the index handling to increment only after a succesful dump.

    Fixes: 502a2ffd7376a ("ipv6: convert idev_list to list macros")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit ee1abcf689353f36d9322231b4320926096bdee0 ]

    Commit a61bbcf28a8c ("[NET]: Store skb->timestamp as offset to a base
    timestamp") introduces a neighbour control buffer and zeroes it out in
    ndisc_rcv(), as ndisc_recv_ns() uses it.

    Commit f2776ff04722 ("[IPV6]: Fix address/interface handling in UDP and
    DCCP, according to the scoping architecture.") introduces the usage of the
    IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
    present-day __udp6_lib_err()).

    Now, with commit b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate
    redirect, instead of rt6_redirect()."), we call protocol error handlers
    from ndisc_redirect_rcv(), after the control buffer is already stolen and
    some parts are already zeroed out. This implies that inet6_iif() on this
    path will always return zero.

    This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
    we might actually need to match sockets for a given interface.

    Instead of always claiming the control buffer in ndisc_rcv(), do that only
    when needed.

    Fixes: b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
    Signed-off-by: Stefano Brivio
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Brivio
     
  • [ Upstream commit dc012f3628eaecfb5ba68404a5c30ef501daf63d ]

    syzbot found a use-after-free in inet6_mc_check [1]

    The problem here is that inet6_mc_check() uses rcu
    and read_lock(&iml->sflock)

    So the fact that ip6_mc_leave_src() is called under RTNL
    and the socket lock does not help us, we need to acquire
    iml->sflock in write mode.

    In the future, we should convert all this stuff to RCU.

    [1]
    BUG: KASAN: use-after-free in ipv6_addr_equal include/net/ipv6.h:521 [inline]
    BUG: KASAN: use-after-free in inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
    Read of size 8 at addr ffff8801ce7f2510 by task syz-executor0/22432

    CPU: 1 PID: 22432 Comm: syz-executor0 Not tainted 4.19.0-rc7+ #280
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
    print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    ipv6_addr_equal include/net/ipv6.h:521 [inline]
    inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
    __raw_v6_lookup+0x320/0x3f0 net/ipv6/raw.c:98
    ipv6_raw_deliver net/ipv6/raw.c:183 [inline]
    raw6_local_deliver+0x3d3/0xcb0 net/ipv6/raw.c:240
    ip6_input_finish+0x467/0x1aa0 net/ipv6/ip6_input.c:345
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
    ip6_mc_input+0x48a/0xd20 net/ipv6/ip6_input.c:503
    dst_input include/net/dst.h:450 [inline]
    ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
    __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
    netif_receive_skb_internal+0x12c/0x620 net/core/dev.c:5126
    napi_frags_finish net/core/dev.c:5664 [inline]
    napi_gro_frags+0x75a/0xc90 net/core/dev.c:5737
    tun_get_user+0x3189/0x4250 drivers/net/tun.c:1923
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1968
    call_write_iter include/linux/fs.h:1808 [inline]
    do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
    do_iter_write+0x185/0x5f0 fs/read_write.c:959
    vfs_writev+0x1f1/0x360 fs/read_write.c:1004
    do_writev+0x11a/0x310 fs/read_write.c:1039
    __do_sys_writev fs/read_write.c:1112 [inline]
    __se_sys_writev fs/read_write.c:1109 [inline]
    __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457421
    Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b5 fb ff c3 48 83 ec 08 e8 1a 2d 00 00 48 89 04 24 b8 14 00 00 00 0f 05 8b 3c 24 48 89 c2 e8 63 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
    RSP: 002b:00007f2d30ecaba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 000000000000003e RCX: 0000000000457421
    RDX: 0000000000000001 RSI: 00007f2d30ecabf0 RDI: 00000000000000f0
    RBP: 0000000020000500 R08: 00000000000000f0 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2d30ecb6d4
    R13: 00000000004c4890 R14: 00000000004d7b90 R15: 00000000ffffffff

    Allocated by task 22437:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
    __do_kmalloc mm/slab.c:3718 [inline]
    __kmalloc+0x14e/0x760 mm/slab.c:3727
    kmalloc include/linux/slab.h:518 [inline]
    sock_kmalloc+0x15a/0x1f0 net/core/sock.c:1983
    ip6_mc_source+0x14dd/0x1960 net/ipv6/mcast.c:427
    do_ipv6_setsockopt.isra.9+0x3afb/0x45d0 net/ipv6/ipv6_sockglue.c:743
    ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:933
    rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1069
    sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3038
    __sys_setsockopt+0x1ba/0x3c0 net/socket.c:1902
    __do_sys_setsockopt net/socket.c:1913 [inline]
    __se_sys_setsockopt net/socket.c:1910 [inline]
    __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 22430:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kfree+0xcf/0x230 mm/slab.c:3813
    __sock_kfree_s net/core/sock.c:2004 [inline]
    sock_kfree_s+0x29/0x60 net/core/sock.c:2010
    ip6_mc_leave_src+0x11a/0x1d0 net/ipv6/mcast.c:2448
    __ipv6_sock_mc_close+0x20b/0x4e0 net/ipv6/mcast.c:310
    ipv6_sock_mc_close+0x158/0x1d0 net/ipv6/mcast.c:328
    inet6_release+0x40/0x70 net/ipv6/af_inet6.c:452
    __sock_release+0xd7/0x250 net/socket.c:579
    sock_close+0x19/0x20 net/socket.c:1141
    __fput+0x385/0xa30 fs/file_table.c:278
    ____fput+0x15/0x20 fs/file_table.c:309
    task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
    tracehook_notify_resume include/linux/tracehook.h:193 [inline]
    exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
    prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
    syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
    do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801ce7f2500
    which belongs to the cache kmalloc-192 of size 192
    The buggy address is located 16 bytes inside of
    192-byte region [ffff8801ce7f2500, ffff8801ce7f25c0)
    The buggy address belongs to the page:
    page:ffffea000739fc80 count:1 mapcount:0 mapping:ffff8801da800040 index:0x0
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0006f6e548 ffffea000737b948 ffff8801da800040
    raw: 0000000000000000 ffff8801ce7f2000 0000000100000010 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801ce7f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801ce7f2480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    >ffff8801ce7f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8801ce7f2580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    ffff8801ce7f2600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • This reverts commit 28c74ff85efd192aeca9005499ca50c24d795f61.

    From Florian Westphal :

    It causes kernel crash for locally generated ipv6 fragments
    when netfilter ipv6 defragmentation is used.

    The faulty commit is not essential for -stable, it only
    delays netns teardown for longer than needed when that netns
    still has ipv6 frags queued. Much better than crash :-/

    Signed-off-by: Sasha Levin

    Sasha Levin
     
  • [ Upstream commit bfc0698bebcb16d19ecfc89574ad4d696955e5d3 ]

    A policy may have been set up with multiple transforms (e.g., ESP
    and ipcomp). In this situation, the ingress IPsec processing
    iterates in xfrm_input() and applies each transform in turn,
    processing the nexthdr to find any additional xfrm that may apply.

    This patch resets the transport header back to network header
    only after the last transformation so that subsequent xfrms
    can find the correct transport header.

    Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
    Suggested-by: Steffen Klassert
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Sowmini Varadhan
     
  • [ Upstream commit 215ab0f021c9fea3c18b75e7d522400ee6a49990 ]

    After commit d6990976af7c5d8f55903bfb4289b6fb030bf754 ("vti6: fix PMTU caching
    and reporting on xmit"), some too big skbs might be potentially passed down to
    __xfrm6_output, causing it to fail to transmit but not free the skb, causing a
    leak of skb, and consequentially a leak of dst references.

    After running pmtu.sh, that shows as failure to unregister devices in a namespace:

    [ 311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1

    The fix is to call kfree_skb in case of transmit failures.

    Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Thadeu Lima de Souza Cascardo
     

18 Oct, 2018

3 commits

  • [ Upstream commit 86f9bd1ff61c413a2a251fa736463295e4e24733 ]

    The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
    handle starting/stopping the iteration. The problem is that at some point
    during the iteration, an overflow is detected and the process is
    subsequently stopped. The item being shown via seq_printf() when the
    overflow occurs is not actually shown, though. When start() is
    subsequently called to resume iterating, it returns the next item, and
    thus the item that was being processed when the overflow occurred never
    gets printed.

    Alter the meaning of the private data member "offset". Currently, when it
    is not 0 (which only happens at the very beginning), "offset" represents
    the next hlist item to be printed. After this change, "offset" always
    represents the current item.

    This is also consistent with the private data member "bucket", which
    represents the current bucket, and also the use of "pos" as defined in
    seq_file.txt:
    The pos passed to start() will always be either zero, or the most
    recent pos used in the previous session.

    Signed-off-by: Jeff Barnhill
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jeff Barnhill
     
  • [ Upstream commit a688caa34beb2fd2a92f1b6d33e40cde433ba160 ]

    In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
    directly assign the dst to skb and set passed in dst to NULL to avoid
    double free.
    However, in error case, we free skb and then do stats update with the
    dst pointer passed in. This causes use-after-free on the dst.
    Fix it by taking rcu read lock right before dst could get released to
    make sure dst does not get freed until the stats update is done.
    Note: we don't have this issue in ipv4 cause dst is not used for stats
    update in v4.

    Syzkaller reported following crash:
    BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
    BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
    Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088

    CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
    print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
    rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:631
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
    __sys_sendmsg+0x11d/0x280 net/socket.c:2152
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg net/socket.c:2159 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457099
    Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
    RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
    RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000

    Allocated by task 32088:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
    dst_alloc+0xbb/0x1d0 net/core/dst.c:105
    ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
    ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
    ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
    ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
    fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
    ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
    ip6_route_output include/net/ip6_route.h:88 [inline]
    ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
    ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
    rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
    inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:631
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
    __sys_sendmsg+0x11d/0x280 net/socket.c:2152
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg net/socket.c:2159 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 5356:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x83/0x290 mm/slab.c:3756
    dst_destroy+0x267/0x3c0 net/core/dst.c:141
    dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
    __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
    rcu_do_batch kernel/rcu/tree.c:2576 [inline]
    invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
    __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
    rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
    __do_softirq+0x30b/0xad8 kernel/softirq.c:292

    Fixes: 1789a640f556 ("raw: avoid two atomics in xmit")
    Signed-off-by: Wei Wang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Wang
     
  • [ Upstream commit 76c0ddd8c3a683f6e2c6e60e11dc1a1558caf4bc ]

    the ip6 tunnel xmit ndo assumes that the processed skb always
    contains an ip[v6] header, but syzbot has found a way to send
    frames that fall short of this assumption, leading to the following splat:

    BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
    [inline]
    BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
    net/ipv6/ip6_tunnel.c:1390
    CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
    ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
    ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
    __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
    netdev_start_xmit include/linux/netdevice.h:4075 [inline]
    xmit_one net/core/dev.c:3026 [inline]
    dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
    __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
    dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
    packet_snd net/packet/af_packet.c:2944 [inline]
    packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
    SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
    SyS_sendmmsg+0x63/0x90 net/socket.c:2162
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x441819
    RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
    RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
    RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
    R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
    kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
    kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
    slab_post_alloc_hook mm/slab.h:445 [inline]
    slab_alloc_node mm/slub.c:2737 [inline]
    __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:984 [inline]
    alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
    sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
    packet_alloc_skb net/packet/af_packet.c:2803 [inline]
    packet_snd net/packet/af_packet.c:2894 [inline]
    packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
    sock_sendmsg_nosec net/socket.c:630 [inline]
    sock_sendmsg net/socket.c:640 [inline]
    ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
    __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
    SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
    SyS_sendmmsg+0x63/0x90 net/socket.c:2162
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    This change addresses the issue adding the needed check before
    accessing the inner header.

    The ipv4 side of the issue is apparently there since the ipv4 over ipv6
    initial support, and the ipv6 side predates git history.

    Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
    Tested-by: Alexander Potapenko
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

29 Sep, 2018

3 commits

  • [ Upstream commit eb63f2964dbe36f26deac77d3016791675821ded ]

    Currently the UDPv6 early demux rx code path lacks some mandatory
    checks, already implemented into the normal RX code path - namely
    the checksum conversion and no_check6_rx check.

    Similar to the previous commit, we move the common processing to
    an UDPv6 specific helper and call it from both edemux code path
    and normal code path. In respect to the UDPv4, we need to add an
    explicit check for non zero csum according to no_check6_rx value.

    Reported-by: Jianlin Shi
    Suggested-by: Xin Long
    Fixes: c9f2c1ae123a ("udp6: fix socket leak on early demux")
    Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • [ Upstream commit bbd6528d28c1b8e80832b3b018ec402b6f5c3215 ]

    In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
    we need to call skb_set_owner_w() before consuming original skb,
    otherwise we risk a use-after-free.

    Bring IPv6 in line with what we do in IPv4 to fix this.

    Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit c56cae23c6b167acc68043c683c4573b80cbcc2c ]

    When splitting a GSO segment that consists of encapsulated packets, the
    skb->mac_len of the segments can end up being set wrong, causing packet
    drops in particular when using act_mirred and ifb interfaces in
    combination with a qdisc that splits GSO packets.

    This happens because at the time skb_segment() is called, network_header
    will point to the inner header, throwing off the calculation in
    skb_reset_mac_len(). The network_header is subsequently adjust by the
    outer IP gso_segment handlers, but they don't set the mac_len.

    Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
    gso_segment handlers, after they modify the network_header.

    Many thanks to Eric Dumazet for his help in identifying the cause of
    the bug.

    Acked-by: Dave Taht
    Reviewed-by: Eric Dumazet
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     

26 Sep, 2018

1 commit

  • commit f7225172f25aaf0dfd9ad65f05be8da5d6108b12 upstream.

    syzbot reported a use-after-free:

    BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
    Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555

    CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
    ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
    ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    Allocated by task 4555:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    dst_alloc+0xbb/0x1d0 net/core/dst.c:104
    __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
    ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
    ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
    ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    Freed by task 4555:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    dst_destroy+0x267/0x3c0 net/core/dst.c:140
    dst_release_immediate+0x71/0x9e net/core/dst.c:205
    fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
    __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
    ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
    inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
    ...

    The problem is that rt_last can point to a deleted route if the insert
    fails.

    One reproducer is to insert a route and then add a multipath route that
    has a duplicate nexthop.e.g,:
    $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
    $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2

    Fix by not setting rt_last until the it is verified the insert succeeded.

    Backport Note:
    - Upstream has replaced rt6_info usage with fib6_info in 8d1c802b281
    ("net/ipv6: Flip FIB entries to fib6_info")
    - fib6_info_release was introduced upstream in 93531c674315
    ("net/ipv6: separate handling of FIB entries from dst based routes"),
    but is not present in stable kernels; 4.14.y relies on dst_release/
    ip6_rt_put/dst_release_immediate.

    Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
    Cc: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David Ahern
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Zubin Mithra
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

20 Sep, 2018

10 commits

  • commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream

    A kernel crash occurrs when defragmented packet is fragmented
    in ip_do_fragment().
    In defragment routine, skb_orphan() is called and
    skb->ip_defrag_offset is set. but skb->sk and
    skb->ip_defrag_offset are same union member. so that
    frag->sk is not NULL.
    Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
    defragmented packet is fragmented.

    test commands:
    %iptables -t nat -I POSTROUTING -j MASQUERADE
    %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

    splat looks like:
    [ 261.069429] kernel BUG at net/ipv4/ip_output.c:636!
    [ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
    [ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
    [ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
    [ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
    [ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
    [ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
    [ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
    [ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
    [ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
    [ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
    [ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
    [ 261.198158] Call Trace:
    [ 261.199018] ? dst_output+0x180/0x180
    [ 261.205011] ? save_trace+0x300/0x300
    [ 261.209018] ? ip_copy_metadata+0xb00/0xb00
    [ 261.213034] ? sched_clock_local+0xd4/0x140
    [ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack]
    [ 261.223014] ? rt_cpu_seq_stop+0x10/0x10
    [ 261.227014] ? find_held_lock+0x39/0x1c0
    [ 261.233008] ip_finish_output+0x51d/0xb50
    [ 261.237006] ? ip_fragment.constprop.56+0x220/0x220
    [ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
    [ 261.250152] ? rcu_is_watching+0x77/0x120
    [ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
    [ 261.261033] ? nf_hook_slow+0xb1/0x160
    [ 261.265007] ip_output+0x1c7/0x710
    [ 261.269005] ? ip_mc_output+0x13f0/0x13f0
    [ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0
    [ 261.278152] ? ip_fragment.constprop.56+0x220/0x220
    [ 261.282996] ? nf_hook_slow+0xb1/0x160
    [ 261.287007] raw_sendmsg+0x21f9/0x4420
    [ 261.291008] ? dst_output+0x180/0x180
    [ 261.297003] ? sched_clock_cpu+0x126/0x170
    [ 261.301003] ? find_held_lock+0x39/0x1c0
    [ 261.306155] ? stop_critical_timings+0x420/0x420
    [ 261.311004] ? check_flags.part.36+0x450/0x450
    [ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40
    [ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40
    [ 261.326142] ? cyc2ns_read_end+0x10/0x10
    [ 261.330139] ? raw_bind+0x280/0x280
    [ 261.334138] ? sched_clock_cpu+0x126/0x170
    [ 261.338995] ? check_flags.part.36+0x450/0x450
    [ 261.342991] ? __lock_acquire+0x4500/0x4500
    [ 261.348994] ? inet_sendmsg+0x11c/0x500
    [ 261.352989] ? dst_output+0x180/0x180
    [ 261.357012] inet_sendmsg+0x11c/0x500
    [ ... ]

    v2:
    - clear skb->sk at reassembly routine.(Eric Dumarzet)

    Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
    Suggested-by: Eric Dumazet
    Signed-off-by: Taehee Yoo
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • commit bffa72cf7f9df842f0016ba03586039296b4caaf upstream

    skb->rbnode shares space with skb->next, skb->prev and skb->tstamp

    Current uses (TCP receive ofo queue and netem) need to save/restore
    tstamp, while skb->dev is either NULL (TCP) or a constant for a given
    queue (netem).

    Since we plan using an RB tree for TCP retransmit queue to speedup SACK
    processing with large BDP, this patch exchanges skb->dev and
    skb->tstamp.

    This saves some overhead in both TCP and netem.

    v2: removes the swtstamp field from struct tcp_skb_cb

    Signed-off-by: Eric Dumazet
    Cc: Soheil Hassas Yeganeh
    Cc: Wei Wang
    Cc: Willem de Bruijn
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • don't bother with pathological cases, they only waste cycles.
    IPv6 requires a minimum MTU of 1280 so we should never see fragments
    smaller than this (except last frag).

    v3: don't use awkward "-offset + len"
    v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
    There were concerns that there could be even smaller frags
    generated by intermediate nodes, e.g. on radio networks.

    Cc: Peter Oskolkov
    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    (cherry picked from commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6)
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
    since linker might place next to it a non zero value preventing a change
    to ip6frag_low_thresh.

    ip6frag_low_thresh is not used anymore in the kernel, but we do not
    want to prematuraly break user scripts wanting to change it.

    Since specifying a minimal value of 0 for proc_doulongvec_minmax()
    is moot, let's remove these zero values in all defrag units.

    Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
    Signed-off-by: Eric Dumazet
    Reported-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller
    (cherry picked from commit 3d23401283e80ceb03f765842787e0e79ff598b7)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Make it similar to IPv4 ip_expire(), and release the lock
    before calling icmp functions.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 05c0b86b9696802fd0ce5676a92a63f1b455bdf3)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Some users are willing to provision huge amounts of memory to be able
    to perform reassembly reasonnably well under pressure.

    Current memory tracking is using one atomic_t and integers.

    Switch to atomic_long_t so that 64bit arches can use more than 2GB,
    without any cost for 32bit arches.

    Note that this patch avoids an overflow error, if high_thresh was set
    to ~2GB, since this test in inet_frag_alloc() was never true :

    if (... || frag_mem_limit(nf) > nf->high_thresh)

    Tested:

    $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh

    $ grep FRAG /proc/net/sockstat
    FRAG: inuse 14705885 memory 16000002880

    $ nstat -n ; sleep 1 ; nstat | grep Reas
    IpReasmReqds 3317150 0.0
    IpReasmFails 3317112 0.0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • This function is obsolete, after rhashtable addition to inet defrag.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • This refactors ip_expire() since one indentation level is removed.

    Note: in the future, we should try hard to avoid the skb_clone()
    since this is a serious performance cost.
    Under DDOS, the ICMP message wont be sent because of rate limits.

    Fact that ip6_expire_frag_queue() does not use skb_clone() is
    disturbing too. Presumably IPv6 should have the same
    issue than the one we fixed in commit ec4fbd64751d
    ("inet: frag: release spinlock before calling icmp_send()")

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 399d1404be660d355192ff4df5ccc3f4159ec1e4)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()

    Also since we use rhashtable we can bring back the number of fragments
    in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
    removed in commit 434d305405ab ("inet: frag: don't account number
    of fragment queues")

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    (cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • Some applications still rely on IP fragmentation, and to be fair linux
    reassembly unit is not working under any serious load.

    It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

    A work queue is supposed to garbage collect items when host is under memory
    pressure, and doing a hash rebuild, changing seed used in hash computations.

    This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
    occurring every 5 seconds if host is under fire.

    Then there is the problem of sharing this hash table for all netns.

    It is time to switch to rhashtables, and allocate one of them per netns
    to speedup netns dismantle, since this is a critical metric these days.

    Lookup is now using RCU. A followup patch will even remove
    the refcount hold/release left from prior implementation and save
    a couple of atomic operations.

    Before this patch, 16 cpus (16 RX queue NIC) could not handle more
    than 1 Mpps frags DDOS.

    After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
    of storage for the fragments (exact number depends on frags being evicted
    after timeout)

    $ grep FRAG /proc/net/sockstat
    FRAG: inuse 1966916 memory 2140004608

    A followup patch will change the limits for 64bit arches.

    Signed-off-by: Eric Dumazet
    Cc: Kirill Tkhai
    Cc: Herbert Xu
    Cc: Florian Westphal
    Cc: Jesper Dangaard Brouer
    Cc: Alexander Aring
    Cc: Stefan Schmidt
    Signed-off-by: David S. Miller
    (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet