18 Oct, 2018

1 commit

  • [ Upstream commit 9d2f67e43b73e8af7438be219b66a5de0cfa8bd9 ]

    When we use raw socket as the vhost backend, a packet from virito with
    gso offloading information, cannot be sent out in later validaton at
    xmit path, as we did not set correct skb->protocol which is further used
    for looking up the gso function.

    To fix this, we set this field according to virito hdr information.

    Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
    Signed-off-by: Jianfeng Tan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jianfeng Tan
     

24 Aug, 2018

2 commits

  • commit 4576cd469d980317c4edd9173f8b694aa71ea3a3 upstream.

    TPACKET_V3 stores variable length frames in fixed length blocks.
    Blocks must be able to store a block header, optional private space
    and at least one minimum sized frame.

    Frames, even for a zero snaplen packet, store metadata headers and
    optional reserved space.

    In the block size bounds check, ensure that the frame of the
    chosen configuration fits. This includes sockaddr_ll and optional
    tp_reserve.

    Syzbot was able to construct a ring with insuffient room for the
    sockaddr_ll in the header of a zero-length frame, triggering an
    out-of-bounds write in dev_parse_header.

    Convert the comparison to less than, as zero is a valid snap len.
    This matches the test for minimum tp_frame_size immediately below.

    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Fixes: eb73190f4fbe ("net/packet: refine check for priv area size")
    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 993675a3100b16a4c80dfd70cbcde8ea7127b31d ]

    If variable length link layer headers result in a packet shorter
    than dev->hard_header_len, reset the network header offset. Else
    skb->mac_len may exceed skb->len after skb_mac_reset_len.

    packet_sendmsg_spkt already has similar logic.

    Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

22 Jul, 2018

1 commit

  • [ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ]

    We should put copy_skb in receive_queue only after
    a successful call to virtio_net_hdr_from_skb().

    syzbot report :

    BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
    BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
    BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
    Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553

    CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    __skb_unlink include/linux/skbuff.h:1843 [inline]
    __skb_dequeue include/linux/skbuff.h:1863 [inline]
    skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
    skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
    packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
    packet_release+0x630/0xd90 net/packet/af_packet.c:2991
    __sock_release+0xd7/0x260 net/socket.c:603
    sock_close+0x19/0x20 net/socket.c:1186
    __fput+0x35b/0x8b0 fs/file_table.c:209
    ____fput+0x15/0x20 fs/file_table.c:243
    task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x1b08/0x2750 kernel/exit.c:865
    do_group_exit+0x177/0x440 kernel/exit.c:968
    __do_sys_exit_group kernel/exit.c:979 [inline]
    __se_sys_exit_group kernel/exit.c:977 [inline]
    __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4448e9
    Code: Bad RIP value.
    RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
    RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
    RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
    R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
    R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000

    Allocated by task 4553:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
    tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
    deliver_skb net/core/dev.c:1925 [inline]
    deliver_ptype_list_skb net/core/dev.c:1940 [inline]
    __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
    netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
    netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
    tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
    tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
    call_write_iter include/linux/fs.h:1795 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
    vfs_write+0x1f8/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 4553:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
    __kfree_skb net/core/skbuff.c:642 [inline]
    kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
    tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
    deliver_skb net/core/dev.c:1925 [inline]
    deliver_ptype_list_skb net/core/dev.c:1940 [inline]
    __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
    netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
    netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
    tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
    tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
    call_write_iter include/linux/fs.h:1795 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
    vfs_write+0x1f8/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801b044ecc0
    which belongs to the cache skbuff_head_cache of size 232
    The buggy address is located 0 bytes inside of
    232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
    The buggy address belongs to the page:
    page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
    raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
    >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
    ^
    ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc

    Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

26 Jun, 2018

1 commit

  • [ Upstream commit fd3a88625844907151737fc3b4201676effa6d27 ]

    Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
    to communicate packet metadata to userspace.

    For skbuffs with vlan, the first two return the packet as it may have
    existed on the wire, inserting the VLAN tag in the user buffer. Then
    virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.

    Commit f09e2249c4f5 ("macvtap: restore vlan header on user read")
    added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix
    csum_start when VLAN tags are present") then fixed up csum_start.

    Virtio, packet and uml do not insert the vlan header in the user
    buffer.

    When introducing virtio_net_hdr_from_skb to deduplicate filling in
    the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
    applied uniformly, breaking csum offset for packets with vlan on
    virtio and packet.

    Make insertion of VLAN_HLEN optional. Convert the callers to pass it
    when needed.

    Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
    Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

12 Jun, 2018

2 commits

  • [ Upstream commit 9aad13b087ab0a588cd68259de618f100053360e ]

    Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
    layer allocation") ensures that packet_snd always starts writing
    the link layer header in reserved headroom allocated for this
    purpose.

    This is needed because packets may be shorter than hard_header_len,
    in which case the space up to hard_header_len may be zeroed. But
    that necessary padding is not accounted for in skb->len.

    The fix, however, is buggy. It calls skb_push, which grows skb->len
    when moving skb->data back. But in this case packet length should not
    change.

    Instead, call skb_reserve, which moves both skb->data and skb->tail
    back, without changing length.

    Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation")
    Reported-by: Tariq Toukan
    Signed-off-by: Willem de Bruijn
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit eb73190f4fbeedf762394e92d6a4ec9ace684c88 ]

    syzbot was able to trick af_packet again [1]

    Various commits tried to address the problem in the past,
    but failed to take into account V3 header size.

    [1]

    tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96
    BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
    BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
    Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106

    CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: ipv6_addrconf addrconf_dad_work
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436
    prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
    prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
    __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline]
    packet_current_rx_frame net/packet/af_packet.c:1117 [inline]
    tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282
    dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018
    xmit_one net/core/dev.c:3049 [inline]
    dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069
    __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
    neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358
    neigh_output include/net/neighbour.h:482 [inline]
    ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120
    ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
    NF_HOOK_COND include/linux/netfilter.h:277 [inline]
    ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
    dst_output include/net/dst.h:444 [inline]
    NF_HOOK include/linux/netfilter.h:288 [inline]
    ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491
    ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633
    addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033
    process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
    worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
    kthread+0x345/0x410 kernel/kthread.c:240
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

    The buggy address belongs to the page:
    page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80
    flags: 0x2fffc0000000000()
    raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80
    raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    >ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    ^
    ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

    Fixes: 2b6867c2ce76 ("net/packet: fix overflow in check for priv area size")
    Fixes: dc808110bb62 ("packet: handle too big packets for PACKET_V3")
    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

25 May, 2018

1 commit

  • [ Upstream commit b84bbaf7a6c8cca24f8acf25a2c8e46913a947ba ]

    Packet sockets allow construction of packets shorter than
    dev->hard_header_len to accommodate protocols with variable length
    link layer headers. These packets are padded to dev->hard_header_len,
    because some device drivers interpret that as a minimum packet size.

    packet_snd reserves dev->hard_header_len bytes on allocation.
    SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
    link layer headers are stored in the reserved range. SOCK_RAW sockets
    do the same in tpacket_snd, but not in packet_snd.

    Syzbot was able to send a zero byte packet to a device with massive
    116B link layer header, causing padding to cross over into skb_shinfo.
    Fix this by writing from the start of the llheader reserved range also
    in the case of packet_snd/SOCK_RAW.

    Update skb_set_network_header to the new offset. This also corrects
    it for SOCK_DGRAM, where it incorrectly double counted reserve due to
    the skb_push in dev_hard_header.

    Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
    Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

29 Apr, 2018

2 commits

  • [ Upstream commit a6361f0ca4b25460f2cdf3235ebe8115f622901e ]

    Updates to the bitfields in struct packet_sock are not atomic.
    Serialize these read-modify-write cycles.

    Move po->running into a separate variable. Its writes are protected by
    po->bind_lock (except for one startup case at packet_create). Also
    replace a textual precondition warning with lockdep annotation.

    All others are set only in packet_setsockopt. Serialize these
    updates by holding the socket lock. Analogous to other field updates,
    also hold the lock when testing whether a ring is active (pg_vec).

    Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg")
    Reported-by: DaeRyong Jeong
    Reported-by: Byoungyoung Lee
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 5171b37d959641bbc619781caf62e61f7b940871 ]

    In order to remove the race caught by syzbot [1], we need
    to lock the socket before using po->tp_version as this could
    change under us otherwise.

    This means lock_sock() and release_sock() must be done by
    packet_set_ring() callers.

    [1] :
    BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
    CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
    packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
    packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
    SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
    SyS_setsockopt+0x76/0xa0 net/socket.c:1828
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x449099
    RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
    RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
    RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
    R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001

    Local variable description: ----req_u@packet_setsockopt
    Variable was created at:
    packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
    SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849

    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

17 Dec, 2017

2 commits

  • [ Upstream commit 15fe076edea787807a7cdc168df832544b58eba6 ]

    syzbot reported crashes [1] and provided a C repro easing bug hunting.

    When/if packet_do_bind() calls __unregister_prot_hook() and releases
    po->bind_lock, another thread can run packet_notifier() and process an
    NETDEV_UP event.

    This calls register_prot_hook() and hooks again the socket right before
    first thread is able to grab again po->bind_lock.

    Fixes this issue by temporarily setting po->num to 0, as suggested by
    David Miller.

    [1]
    dev_remove_pack: ffff8801bf16fa80 not found
    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:7945! ( BUG_ON(!list_empty(&dev->ptype_all)); )
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    device syz0 entered promiscuous mode
    CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff8801cc57a500 task.stack: ffff8801cc588000
    RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
    RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
    RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
    RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
    device syz0 entered promiscuous mode
    RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
    R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
    FS: 0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
    tun_detach drivers/net/tun.c:670 [inline]
    tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
    __fput+0x333/0x7f0 fs/file_table.c:210
    ____fput+0x15/0x20 fs/file_table.c:244
    task_work_run+0x199/0x270 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x9bb/0x1ae0 kernel/exit.c:865
    do_group_exit+0x149/0x400 kernel/exit.c:968
    SYSC_exit_group kernel/exit.c:979 [inline]
    SyS_exit_group+0x1d/0x20 kernel/exit.c:977
    entry_SYSCALL_64_fastpath+0x1f/0x96
    RIP: 0033:0x44ad19

    Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Francesco Ruggeri
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • syzkaller found a race condition fanout_demux_rollover() while removing
    a packet socket from a fanout group.

    po->rollover is read and operated on during packet_rcv_fanout(), via
    fanout_demux_rollover(), but the pointer is currently cleared before the
    synchronization in packet_release(). It is safer to delay the cleanup
    until after synchronize_net() has been called, ensuring all calls to
    packet_rcv_fanout() for this socket have finished.

    To further simplify synchronization around the rollover structure, set
    po->rollover in fanout_add() only if there are no errors. This removes
    the need for rcu in the struct and in the call to
    packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).

    Crashing stack trace:
    fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
    packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
    dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
    xmit_one net/core/dev.c:2975 [inline]
    dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
    __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
    neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
    neigh_output include/net/neighbour.h:482 [inline]
    ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
    ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
    NF_HOOK_COND include/linux/netfilter.h:239 [inline]
    ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
    dst_output include/net/dst.h:459 [inline]
    NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
    mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
    mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
    mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
    ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
    addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
    addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
    process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
    worker_thread+0x223/0x1990 kernel/workqueue.c:2247
    kthread+0x35e/0x430 kernel/kthread.c:231
    ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432

    Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
    Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()")
    Reported-by: syzbot
    Signed-off-by: Mike Maloney
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mike Maloney
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

21 Oct, 2017

1 commit

  • syzkaller got crashes in packet_getsockopt() processing
    PACKET_ROLLOVER_STATS command while another thread was managing
    to change po->rollover

    Using RCU will fix this bug. We might later add proper RCU annotations
    for sparse sake.

    In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
    variant, as spotted by John.

    Fixes: a9b6391814d5 ("packet: rollover statistics")
    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Cc: John Sperbeck
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Sep, 2017

2 commits

  • Packet socket option po->has_vnet_hdr can be updated concurrently with
    other operations if no ring is attached.

    Do not test the option twice in packet_snd, as the value may change in
    between calls. A race on setsockopt disable may cause a packet > mtu
    to be sent without having GSO options set.

    Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
    Signed-off-by: Willem de Bruijn
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Once a socket has po->fanout set, it remains a member of the group
    until it is destroyed. The prot_hook must be constant and identical
    across sockets in the group.

    If fanout_add races with packet_do_bind between the test of po->fanout
    and taking the lock, the bind call may make type or dev inconsistent
    with that of the fanout group.

    Hold po->bind_lock when testing po->fanout to avoid this race.

    I had to introduce artificial delay (local_bh_enable) to actually
    observe the race.

    Fixes: dc99f600698d ("packet: Add fanout support.")
    Signed-off-by: Willem de Bruijn
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

21 Sep, 2017

1 commit

  • Packet socket bind operations must hold the po->bind_lock. This keeps
    po->running consistent with whether the socket is actually on a ptype
    list to receive packets.

    fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
    binds the fanout object to receive through packet_rcv_fanout.

    Make it hold the po->bind_lock when testing po->running and rebinding.
    Else, it can race with other rebind operations, such as that in
    packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
    can result in a socket being added to a fanout group twice, causing
    use-after-free KASAN bug reports, among others.

    Reported independently by both trinity and syzkaller.
    Verified that the syzkaller reproducer passes after this patch.

    Fixes: dc99f600698d ("packet: Add fanout support.")
    Reported-by: nixioaming
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

02 Sep, 2017

1 commit


30 Aug, 2017

1 commit


11 Aug, 2017

2 commits


02 Aug, 2017

1 commit


25 Jul, 2017

1 commit

  • There are multiple reports showing we have a use-after-free in
    the timer prb_retire_rx_blk_timer_expired(), where we use struct
    tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
    free_pg_vec().

    The interesting part is it is not freed via packet_release() but
    via packet_setsockopt(), which means we are not closing the socket.
    Looking into the big and fat function packet_set_ring(), this could
    happen if we satisfy the following conditions:

    1. closing == 0, not on packet_release() path
    2. req->tp_block_nr == 0, we don't allocate a new pg_vec
    3. rx_ring->pg_vec is already set as V3, which means we already called
    packet_set_ring() wtih req->tp_block_nr > 0 previously
    4. req->tp_frame_nr == 0, pass sanity check
    5. po->mapped == 0, never called mmap()

    In this scenario we are clearing the old rx_ring->pg_vec, so we need
    to free this pg_vec, but we don't stop the timer on this path because
    of closing==0.

    The timer has to be stopped as long as we need to free pg_vec, therefore
    the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.

    Thanks to liujian for testing different fixes.

    Reported-by: alexander.levin@verizon.com
    Reported-by: Dave Jones
    Reported-by: liujian (CE)
    Tested-by: liujian (CE)
    Cc: Ding Tianhong
    Cc: Willem de Bruijn
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

21 Jul, 2017

1 commit


20 Jul, 2017

1 commit

  • This patch removes the definition of PGV_FROM_VMALLOC from af_packet.c.
    The PGV_FROM_VMALLOC definition was already removed by
    commit 441c793a5650 ("net: cleanup unused macros in net directory"),
    and its usage was removed even before by commit c56b4d90123b
    ("af_packet: remove pgv.flags"); but it was added back by mistake later on,
    in commit f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation").

    Signed-off-by: Rami Rosen
    Signed-off-by: David S. Miller

    Rosen, Rami
     

14 Jul, 2017

1 commit

  • When PACKET_QDISC_BYPASS is not used, Tx queue selection will be done
    before the packet is enqueued, taking into account any mappings set by
    a queuing discipline such as mqprio without hardware offloading. This
    selection may be affected by a previously saved queue_mapping, either on
    the Rx path, or done before the packet reaches the device, as it's
    currently the case for AF_PACKET.

    In order for queue selection to work as expected when using traffic
    control, there can't be another selection done before that point is
    reached, so move the call to packet_pick_tx_queue to
    packet_direct_xmit, leaving the default xmit path as it was before
    PACKET_QDISC_BYPASS was introduced.

    A forward declaration of packet_pick_tx_queue() is introduced to avoid
    the need to reorder the functions within the file.

    Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
    Signed-off-by: Iván Briano
    Signed-off-by: David S. Miller

    Iván Briano
     

01 Jul, 2017

3 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

11 Jun, 2017

1 commit


26 May, 2017

1 commit


16 May, 2017

1 commit


27 Apr, 2017

1 commit


26 Apr, 2017

1 commit

  • In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4
    |val| remains uninitialized and the syscall may behave differently
    depending on its value, and even copy garbage to userspace on certain
    architectures. To fix this we now return -EINVAL if optlen is too small.

    This bug has been detected with KMSAN.

    Signed-off-by: Alexander Potapenko
    Signed-off-by: David S. Miller

    Alexander Potapenko
     

25 Apr, 2017

1 commit

  • Fanout uses a per net global namespace. A process that intends to create
    a new fanout group can accidentally join an existing group. It is not
    possible to detect this.

    Add socket option PACKET_FANOUT_FLAG_UNIQUEID. When specified the
    supplied fanout group id must be set to 0, and the kernel chooses an id
    that is not already in use. This is an ephemeral flag so that
    other sockets can be added to this group using setsockopt, but NOT
    specifying this flag. The current getsockopt(..., PACKET_FANOUT, ...)
    can be used to retrieve the new group id.

    We assume that there are not a lot of fanout groups and that this is not
    a high frequency call.

    The method assigns ids starting at zero and increases until it finds an
    unused id. It keeps track of the last assigned id, and uses it as a
    starting point to find new ids.

    Signed-off-by: Mike Maloney
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Mike Maloney
     

31 Mar, 2017

3 commits


02 Mar, 2017

1 commit

  • KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
    uninitialized memory in packet_bind_spkt():
    Acked-by: Eric Dumazet

    ==================================================================
    BUG: KMSAN: use of unitialized memory
    CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
    01/01/2011
    0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
    ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
    0000000000000000 0000000000000092 00000000ec400911 0000000000000002
    Call Trace:
    [< inline >] __dump_stack lib/dump_stack.c:15
    [] dump_stack+0x238/0x290 lib/dump_stack.c:51
    [] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
    [] __msan_warning+0x5b/0xb0
    mm/kmsan/kmsan_instr.c:424
    [< inline >] strlen lib/string.c:484
    [] strlcpy+0x9d/0x200 lib/string.c:144
    [] packet_bind_spkt+0x144/0x230
    net/packet/af_packet.c:3132
    [] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
    [] SyS_bind+0x82/0xa0 net/socket.c:1356
    [] entry_SYSCALL_64_fastpath+0x13/0x8f
    arch/x86/entry/entry_64.o:?
    chained origin: 00000000eba00911
    [] save_stack_trace+0x27/0x50
    arch/x86/kernel/stacktrace.c:67
    [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
    [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334
    [] kmsan_internal_chain_origin+0x118/0x1e0
    mm/kmsan/kmsan.c:527
    [] __msan_set_alloca_origin4+0xc3/0x130
    mm/kmsan/kmsan_instr.c:380
    [] SYSC_bind+0x129/0x5f0 net/socket.c:1356
    [] SyS_bind+0x82/0xa0 net/socket.c:1356
    [] entry_SYSCALL_64_fastpath+0x13/0x8f
    arch/x86/entry/entry_64.o:?
    origin description: ----address@SYSC_bind (origin=00000000eb400911)
    ==================================================================
    (the line numbers are relative to 4.8-rc6, but the bug persists
    upstream)

    , when I run the following program as root:

    =====================================
    #include
    #include
    #include
    #include

    int main() {
    struct sockaddr addr;
    memset(&addr, 0xff, sizeof(addr));
    addr.sa_family = AF_PACKET;
    int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
    bind(fd, &addr, sizeof(addr));
    return 0;
    }
    =====================================

    This happens because addr.sa_data copied from the userspace is not
    zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
    results in calling strlen() on the kernel copy of that non-terminated
    buffer.

    Signed-off-by: Alexander Potapenko
    Signed-off-by: David S. Miller

    Alexander Potapenko
     

20 Feb, 2017

1 commit