03 Apr, 2019

2 commits

  • [ Upstream commit a4dc6a49156b1f8d6e17251ffda17c9e6a5db78a ]

    When using fanouts with AF_PACKET, the demux functions such as
    fanout_demux_cpu will return an index in the fanout socket array, which
    corresponds to the selected socket.

    The ordering of this array depends on the order the sockets were added
    to a given fanout group, so for FANOUT_CPU this means sockets are bound
    to cpus in the order they are configured, which is OK.

    However, when stopping then restarting the interface these sockets are
    bound to, the sockets are reassigned to the fanout group in the reverse
    order, due to the fact that they were inserted at the head of the
    interface's AF_PACKET socket list.

    This means that traffic that was directed to the first socket in the
    fanout group is now directed to the last one after an interface restart.

    In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
    socket that used to receive traffic from the last CPU after an interface
    restart.

    This commit introduces a helper to add a socket at the tail of a list,
    then uses it to register AF_PACKET sockets.

    Note that this changes the order in which sockets are listed in /proc and
    with sock_diag.

    Fixes: dc99f600698d ("packet: Add fanout support")
    Signed-off-by: Maxime Chevallier
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Maxime Chevallier
     
  • [ Upstream commit 398f0132c14754fcd03c1c4f8e7176d001ce8ea1 ]

    Since commit fc62814d690c ("net/packet: fix 4gb buffer limit due to overflow check")
    one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller
    found that that triggers a warning:

    [ 21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0
    [ 21.101490] Modules linked in:
    [ 21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146
    [ 21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
    [ 21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630
    [ 21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3
    [ 21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246
    [ 21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000
    [ 21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000
    [ 21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67
    [ 21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d
    [ 21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d
    [ 21.112552] FS: 00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000
    [ 21.113612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0
    [ 21.115367] Call Trace:
    [ 21.115705] ? __alloc_pages_slowpath+0x21c0/0x21c0
    [ 21.116362] alloc_pages_current+0xac/0x1e0
    [ 21.116923] kmalloc_order+0x18/0x70
    [ 21.117393] kmalloc_order_trace+0x18/0x110
    [ 21.117949] packet_set_ring+0x9d5/0x1770
    [ 21.118524] ? packet_rcv_spkt+0x440/0x440
    [ 21.119094] ? lock_downgrade+0x620/0x620
    [ 21.119646] ? __might_fault+0x177/0x1b0
    [ 21.120177] packet_setsockopt+0x981/0x2940
    [ 21.120753] ? __fget+0x2fb/0x4b0
    [ 21.121209] ? packet_release+0xab0/0xab0
    [ 21.121740] ? sock_has_perm+0x1cd/0x260
    [ 21.122297] ? selinux_secmark_relabel_packet+0xd0/0xd0
    [ 21.123013] ? __fget+0x324/0x4b0
    [ 21.123451] ? selinux_netlbl_socket_setsockopt+0x101/0x320
    [ 21.124186] ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0
    [ 21.124908] ? __lock_acquire+0x529/0x3200
    [ 21.125453] ? selinux_socket_setsockopt+0x5d/0x70
    [ 21.126075] ? __sys_setsockopt+0x131/0x210
    [ 21.126533] ? packet_release+0xab0/0xab0
    [ 21.127004] __sys_setsockopt+0x131/0x210
    [ 21.127449] ? kernel_accept+0x2f0/0x2f0
    [ 21.127911] ? ret_from_fork+0x8/0x50
    [ 21.128313] ? do_raw_spin_lock+0x11b/0x280
    [ 21.128800] __x64_sys_setsockopt+0xba/0x150
    [ 21.129271] ? lockdep_hardirqs_on+0x37f/0x560
    [ 21.129769] do_syscall_64+0x9f/0x450
    [ 21.130182] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    We should allocate with __GFP_NOWARN to handle this.

    Cc: Kal Conley
    Cc: Andrey Konovalov
    Fixes: fc62814d690c ("net/packet: fix 4gb buffer limit due to overflow check")
    Signed-off-by: Christoph Paasch
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Christoph Paasch
     

27 Feb, 2019

1 commit

  • [ Upstream commit fc62814d690cf62189854464f4bd07457d5e9e50 ]

    When calculating rb->frames_per_block * req->tp_block_nr the result
    can overflow. Check it for overflow without limiting the total buffer
    size to UINT_MAX.

    This change fixes support for packet ring buffers >= UINT_MAX.

    Fixes: 8f8d28e4d6d8 ("net/packet: fix overflow in check for tp_frame_nr")
    Signed-off-by: Kal Conley
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Kal Conley
     

23 Feb, 2019

1 commit

  • [ Upstream commit 88a8121dc1d3d0dbddd411b79ed236b6b6ea415c ]

    Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in
    SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel:

    Here is a example of the setup:
    $ ip link set ntfp2 up
    $ ip addr add 10.125.0.1/24 dev ntfp2
    $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2
    $ ip addr add fd00:cafe:cafe::1/128 dev tun1
    $ ip link set dev tun1 up
    $ ip route add fd00:200::/64 dev tun1
    $ scapy
    >>> p = []
    >>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest()
    >>> send(p, count=1, inter=0.1)
    >>> quit()
    $ ip -s link ls dev tun1 | grep -A1 "TX.*errors"
    TX: bytes packets errors dropped carrier collsns
    0 0 1 0 0 0

    The problem is that the network offset is set to the hard_header_len of the
    output device (tun1, ie 14 + 20) and in our case, because the packet is
    small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes
    (ipv6 header) starting from the network offset).

    This problem is more generally related to device with variable hard header
    length. To avoid a too intrusive patch in the current release, a (ugly)
    workaround is proposed in this patch. It has to be cleaned up in net-next.

    Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=993675a3100b1
    Link: http://patchwork.ozlabs.org/patch/1024489/
    Fixes: cb9f1b783850 ("ip: validate header length on virtual device xmit")
    CC: Willem de Bruijn
    CC: Maxim Mikityanskiy
    Signed-off-by: Nicolas Dichtel
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Nicolas Dichtel
     

23 Jan, 2019

1 commit

  • [ Upstream commit d972f3dce8d161e2142da0ab1ef25df00e2f21a9 ]

    'dev' is non NULL when the addr_len check triggers so it must goto a label
    that does the dev_put otherwise dev will have a leaked refcount.

    This bug causes the ib_ipoib module to become unloadable when using
    systemd-network as it triggers this check on InfiniBand links.

    Fixes: 99137b7888f4 ("packet: validate address length")
    Reported-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     

10 Jan, 2019

2 commits

  • [ Upstream commit 6b8d95f1795c42161dc0984b6863e95d6acf24ed ]

    Validate packet socket address length if a length is given. Zero
    length is equivalent to not setting an address.

    Fixes: 99137b7888f4 ("packet: validate address length")
    Reported-by: Ido Schimmel
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 99137b7888f4058087895d035d81c6b2d31015c5 ]

    Packet sockets with SOCK_DGRAM may pass an address for use in
    dev_hard_header. Ensure that it is of sufficient length.

    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

06 Dec, 2018

1 commit

  • [ Upstream commit 5cd8d46ea1562be80063f53c7c6a5f40224de623 ]

    tpacket_snd sends packets with user pages linked into skb frags. It
    notifies that pages can be reused when the skb is released by setting
    skb->destructor to tpacket_destruct_skb.

    This can cause data corruption if the skb is orphaned (e.g., on
    transmit through veth) or cloned (e.g., on mirror to another psock).

    Create a kernel-private copy of data in these cases, same as tun/tap
    zerocopy transmission. Reuse that infrastructure: mark the skb as
    SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).

    Unlike other zerocopy packets, do not set shinfo destructor_arg to
    struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
    when the original skb is released and a timestamp is recorded. Do not
    change this timestamp behavior. The ubuf_info->callback is not needed
    anyway, as no zerocopy notification is expected.

    Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
    resulting value is not a valid ubuf_info pointer, nor a valid
    tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.

    The fix relies on features introduced in commit 52267790ef52 ("sock:
    add MSG_ZEROCOPY"), so can be backported as is only to 4.14.

    Tested with from `./in_netns.sh ./txring_overwrite` from
    http://github.com/wdebruij/kerneltools/tests

    Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
    Reported-by: Anand H. Krishnan
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

05 Oct, 2018

1 commit

  • When we use raw socket as the vhost backend, a packet from virito with
    gso offloading information, cannot be sent out in later validaton at
    xmit path, as we did not set correct skb->protocol which is further used
    for looking up the gso function.

    To fix this, we set this field according to virito hdr information.

    Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
    Signed-off-by: Jianfeng Tan
    Signed-off-by: David S. Miller

    Jianfeng Tan
     

01 Sep, 2018

1 commit

  • This reverts commit 71e41286203c017d24f041a7cd71abea7ca7b1e0.

    mmap()/munmap() can not be backed by kmalloced pages :

    We fault in :

    VM_BUG_ON_PAGE(PageSlab(page), page);

    unmap_single_vma+0x8a/0x110
    unmap_vmas+0x4b/0x90
    unmap_region+0xc9/0x140
    do_munmap+0x274/0x360
    vm_munmap+0x81/0xc0
    SyS_munmap+0x2b/0x40
    do_syscall_64+0x13e/0x1c0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    Fixes: 71e41286203c ("packet: switch kvzalloc to allocate memory")
    Signed-off-by: Eric Dumazet
    Reported-by: John Sperbeck
    Bisected-by: John Sperbeck
    Cc: Zhang Yu
    Cc: Li RongQing
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Aug, 2018

1 commit

  • The patches includes following change:

    *Use modern kvzalloc()/kvfree() instead of custom allocations.

    *Remove order argument for alloc_pg_vec, it can get from req.

    *Remove order argument for free_pg_vec, free_pg_vec now uses
    kvfree which does not need order argument.

    *Remove pg_vec_order from struct packet_ring_buffer, no longer
    need to save/restore 'order'

    *Remove variable 'order' for packet_set_ring, it is now unused

    Signed-off-by: Zhang Yu
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     

10 Aug, 2018

1 commit


07 Aug, 2018

1 commit

  • TPACKET_V3 stores variable length frames in fixed length blocks.
    Blocks must be able to store a block header, optional private space
    and at least one minimum sized frame.

    Frames, even for a zero snaplen packet, store metadata headers and
    optional reserved space.

    In the block size bounds check, ensure that the frame of the
    chosen configuration fits. This includes sockaddr_ll and optional
    tp_reserve.

    Syzbot was able to construct a ring with insuffient room for the
    sockaddr_ll in the header of a zero-length frame, triggering an
    out-of-bounds write in dev_parse_header.

    Convert the comparison to less than, as zero is a valid snap len.
    This matches the test for minimum tp_frame_size immediately below.

    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Fixes: eb73190f4fbe ("net/packet: refine check for priv area size")
    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

05 Aug, 2018

1 commit


21 Jul, 2018

1 commit


13 Jul, 2018

1 commit

  • If variable length link layer headers result in a packet shorter
    than dev->hard_header_len, reset the network header offset. Else
    skb->mac_len may exceed skb->len after skb_mac_reset_len.

    packet_sendmsg_spkt already has similar logic.

    Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

10 Jul, 2018

2 commits

  • For most of these calls we can just pass NULL through to the fallback
    function as the sb_dev. The only cases where we cannot are the cases where
    we might be dealing with either an upper device or a driver that would
    have configured things to support an sb_dev itself.

    The only driver that has any significant change in this patch set should be
    ixgbe as we can drop the redundant functionality that existed in both the
    ndo_select_queue function and the fallback function that was passed through
    to us.

    Signed-off-by: Alexander Duyck
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     
  • This patch adds a generic version of the ndo_select_queue functions for
    either returning 0 or selecting a queue based on the processor ID. This is
    generally meant to just reduce the number of functions we have to change
    in the future when we have to deal with ndo_select_queue changes.

    Signed-off-by: Alexander Duyck
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     

07 Jul, 2018

1 commit

  • Initialize the cookie in one location to reduce code duplication and
    avoid bugs from inconsistent initialization, such as that fixed in
    commit 9887cba19978 ("ip: limit use of gso_size to udp").

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

04 Jul, 2018

1 commit


29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Jun, 2018

1 commit

  • We should put copy_skb in receive_queue only after
    a successful call to virtio_net_hdr_from_skb().

    syzbot report :

    BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
    BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
    BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
    Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553

    CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    __skb_unlink include/linux/skbuff.h:1843 [inline]
    __skb_dequeue include/linux/skbuff.h:1863 [inline]
    skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
    skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
    packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
    packet_release+0x630/0xd90 net/packet/af_packet.c:2991
    __sock_release+0xd7/0x260 net/socket.c:603
    sock_close+0x19/0x20 net/socket.c:1186
    __fput+0x35b/0x8b0 fs/file_table.c:209
    ____fput+0x15/0x20 fs/file_table.c:243
    task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x1b08/0x2750 kernel/exit.c:865
    do_group_exit+0x177/0x440 kernel/exit.c:968
    __do_sys_exit_group kernel/exit.c:979 [inline]
    __se_sys_exit_group kernel/exit.c:977 [inline]
    __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4448e9
    Code: Bad RIP value.
    RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
    RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
    RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
    R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
    R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000

    Allocated by task 4553:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
    tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
    deliver_skb net/core/dev.c:1925 [inline]
    deliver_ptype_list_skb net/core/dev.c:1940 [inline]
    __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
    netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
    netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
    tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
    tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
    call_write_iter include/linux/fs.h:1795 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
    vfs_write+0x1f8/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 4553:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
    __kfree_skb net/core/skbuff.c:642 [inline]
    kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
    tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
    deliver_skb net/core/dev.c:1925 [inline]
    deliver_ptype_list_skb net/core/dev.c:1940 [inline]
    __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
    netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
    netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
    tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
    tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
    call_write_iter include/linux/fs.h:1795 [inline]
    new_sync_write fs/read_write.c:474 [inline]
    __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
    vfs_write+0x1f8/0x560 fs/read_write.c:549
    ksys_write+0x101/0x260 fs/read_write.c:598
    __do_sys_write fs/read_write.c:610 [inline]
    __se_sys_write fs/read_write.c:607 [inline]
    __x64_sys_write+0x73/0xb0 fs/read_write.c:607
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801b044ecc0
    which belongs to the cache skbuff_head_cache of size 232
    The buggy address is located 0 bytes inside of
    232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
    The buggy address belongs to the page:
    page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
    raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
    >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
    ^
    ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc

    Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Jun, 2018

1 commit

  • The vzalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vzalloc(a * b)

    with:
    vzalloc(array_size(a, b))

    as well as handling cases of:

    vzalloc(a * b * c)

    with:

    vzalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vzalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vzalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vzalloc(C1 * C2 * C3, ...)
    |
    vzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vzalloc(C1 * C2, ...)
    |
    vzalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

08 Jun, 2018

1 commit

  • Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
    to communicate packet metadata to userspace.

    For skbuffs with vlan, the first two return the packet as it may have
    existed on the wire, inserting the VLAN tag in the user buffer. Then
    virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.

    Commit f09e2249c4f5 ("macvtap: restore vlan header on user read")
    added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix
    csum_start when VLAN tags are present") then fixed up csum_start.

    Virtio, packet and uml do not insert the vlan header in the user
    buffer.

    When introducing virtio_net_hdr_from_skb to deduplicate filling in
    the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
    applied uniformly, breaking csum offset for packets with vlan on
    virtio and packet.

    Make insertion of VLAN_HLEN optional. Convert the callers to pass it
    when needed.

    Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
    Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

07 Jun, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Add Maglev hashing scheduler to IPVS, from Inju Song.

    2) Lots of new TC subsystem tests from Roman Mashak.

    3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

    4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

    5) Add ttl inherit support to vxlan, from Hangbin Liu.

    6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

    7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

    8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

    9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

    10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

    11) Plumb extack down into fib_rules, from Roopa Prabhu.

    12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

    13) Add UDP GSO support, from Willem de Bruijn.

    14) Add documentation for eBPF helpers, from Quentin Monnet.

    15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

    16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

    17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

    18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

    19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

    20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

    21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

    22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

    23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

    24) Add TCP SACK compression, from Eric Dumazet.

    25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

    26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

    27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

    28) Add lots of forwarding selftests, from Petr Machata.

    29) Add generic network device failover driver, from Sridhar Samudrala.

    * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
    strparser: Add __strp_unpause and use it in ktls.
    rxrpc: Fix terminal retransmission connection ID to include the channel
    net: hns3: Optimize PF CMDQ interrupt switching process
    net: hns3: Fix for VF mailbox receiving unknown message
    net: hns3: Fix for VF mailbox cannot receiving PF response
    bnx2x: use the right constant
    Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
    net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
    enic: fix UDP rss bits
    netdev-FAQ: clarify DaveM's position for stable backports
    rtnetlink: validate attributes in do_setlink()
    mlxsw: Add extack messages for port_{un, }split failures
    netdevsim: Add extack error message for devlink reload
    devlink: Add extack to reload and port_{un, }split operations
    net: metrics: add proper netlink validation
    ipmr: fix error path when ipmr_new_table fails
    ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
    net: hns3: remove unused hclgevf_cfg_func_mta_filter
    netfilter: provide udp*_lib_lookup for nf_tproxy
    qed*: Utilize FW 8.37.2.0
    ...

    Linus Torvalds
     

05 Jun, 2018

2 commits

  • Pull aio updates from Al Viro:
    "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

    The only thing I'm holding back for a day or so is Adam's aio ioprio -
    his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
    but let it sit in -next for decency sake..."

    * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    aio: sanitize the limit checking in io_submit(2)
    aio: fold do_io_submit() into callers
    aio: shift copyin of iocb into io_submit_one()
    aio_read_events_ring(): make a bit more readable
    aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
    aio: take list removal to (some) callers of aio_complete()
    aio: add missing break for the IOCB_CMD_FDSYNC case
    random: convert to ->poll_mask
    timerfd: convert to ->poll_mask
    eventfd: switch to ->poll_mask
    pipe: convert to ->poll_mask
    crypto: af_alg: convert to ->poll_mask
    net/rxrpc: convert to ->poll_mask
    net/iucv: convert to ->poll_mask
    net/phonet: convert to ->poll_mask
    net/nfc: convert to ->poll_mask
    net/caif: convert to ->poll_mask
    net/bluetooth: convert to ->poll_mask
    net/sctp: convert to ->poll_mask
    net/tipc: convert to ->poll_mask
    ...

    Linus Torvalds
     
  • Pull procfs updates from Al Viro:
    "Christoph's proc_create_... cleanups series"

    * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
    xfs, proc: hide unused xfs procfs helpers
    isdn/gigaset: add back gigaset_procinfo assignment
    proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
    tty: replace ->proc_fops with ->proc_show
    ide: replace ->proc_fops with ->proc_show
    ide: remove ide_driver_proc_write
    isdn: replace ->proc_fops with ->proc_show
    atm: switch to proc_create_seq_private
    atm: simplify procfs code
    bluetooth: switch to proc_create_seq_data
    netfilter/x_tables: switch to proc_create_seq_private
    netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
    neigh: switch to proc_create_seq_data
    hostap: switch to proc_create_{seq,single}_data
    bonding: switch to proc_create_seq_data
    rtc/proc: switch to proc_create_single_data
    drbd: switch to proc_create_single
    resource: switch to proc_create_seq_data
    staging/rtl8192u: simplify procfs code
    jfs: simplify procfs code
    ...

    Linus Torvalds
     

04 Jun, 2018

1 commit

  • syzbot was able to trick af_packet again [1]

    Various commits tried to address the problem in the past,
    but failed to take into account V3 header size.

    [1]

    tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96
    BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
    BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
    Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106

    CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: ipv6_addrconf addrconf_dad_work
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436
    prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
    prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
    __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline]
    packet_current_rx_frame net/packet/af_packet.c:1117 [inline]
    tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282
    dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018
    xmit_one net/core/dev.c:3049 [inline]
    dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069
    __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
    neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358
    neigh_output include/net/neighbour.h:482 [inline]
    ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120
    ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
    NF_HOOK_COND include/linux/netfilter.h:277 [inline]
    ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
    dst_output include/net/dst.h:444 [inline]
    NF_HOOK include/linux/netfilter.h:288 [inline]
    ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491
    ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633
    addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033
    process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
    worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
    kthread+0x345/0x410 kernel/kthread.c:240
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

    The buggy address belongs to the page:
    page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80
    flags: 0x2fffc0000000000()
    raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80
    raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    >ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    ^
    ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

    Fixes: 2b6867c2ce76 ("net/packet: fix overflow in check for priv area size")
    Fixes: dc808110bb62 ("packet: handle too big packets for PACKET_V3")
    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 May, 2018

1 commit


26 May, 2018

1 commit


25 May, 2018

1 commit

  • Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
    layer allocation") ensures that packet_snd always starts writing
    the link layer header in reserved headroom allocated for this
    purpose.

    This is needed because packets may be shorter than hard_header_len,
    in which case the space up to hard_header_len may be zeroed. But
    that necessary padding is not accounted for in skb->len.

    The fix, however, is buggy. It calls skb_push, which grows skb->len
    when moving skb->data back. But in this case packet length should not
    change.

    Instead, call skb_reserve, which moves both skb->data and skb->tail
    back, without changing length.

    Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation")
    Reported-by: Tariq Toukan
    Signed-off-by: Willem de Bruijn
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

22 May, 2018

1 commit

  • S390 bpf_jit.S is removed in net-next and had changes in 'net',
    since that code isn't used any more take the removal.

    TLS data structures split the TX and RX components in 'net-next',
    put the new struct members from the bug fix in 'net' into the RX
    part.

    The 'net-next' tree had some reworking of how the ERSPAN code works in
    the GRE tunneling code, overlapping with a one-line headroom
    calculation fix in 'net'.

    Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
    that read the prog members via READ_ONCE() into local variables
    before using them.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 May, 2018

1 commit

  • Variants of proc_create{,_data} that directly take a struct seq_operations
    and deal with network namespaces in ->open and ->release. All callers of
    proc_create + seq_open_net converted over, and seq_{open,release}_net are
    removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

14 May, 2018

1 commit

  • Packet sockets allow construction of packets shorter than
    dev->hard_header_len to accommodate protocols with variable length
    link layer headers. These packets are padded to dev->hard_header_len,
    because some device drivers interpret that as a minimum packet size.

    packet_snd reserves dev->hard_header_len bytes on allocation.
    SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
    link layer headers are stored in the reserved range. SOCK_RAW sockets
    do the same in tpacket_snd, but not in packet_snd.

    Syzbot was able to send a zero byte packet to a device with massive
    116B link layer header, causing padding to cross over into skb_shinfo.
    Fix this by writing from the start of the llheader reserved range also
    in the case of packet_snd/SOCK_RAW.

    Update skb_set_network_header to the new offset. This also corrects
    it for SOCK_DGRAM, where it incorrectly double counted reserve due to
    the skb_push in dev_hard_header.

    Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
    Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

04 May, 2018

1 commit


25 Apr, 2018

1 commit

  • Updates to the bitfields in struct packet_sock are not atomic.
    Serialize these read-modify-write cycles.

    Move po->running into a separate variable. Its writes are protected by
    po->bind_lock (except for one startup case at packet_create). Also
    replace a textual precondition warning with lockdep annotation.

    All others are set only in packet_setsockopt. Serialize these
    updates by holding the socket lock. Analogous to other field updates,
    also hold the lock when testing whether a ring is active (pg_vec).

    Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg")
    Reported-by: DaeRyong Jeong
    Reported-by: Byoungyoung Lee
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

16 Apr, 2018

1 commit

  • In order to remove the race caught by syzbot [1], we need
    to lock the socket before using po->tp_version as this could
    change under us otherwise.

    This means lock_sock() and release_sock() must be done by
    packet_set_ring() callers.

    [1] :
    BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
    CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
    packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
    packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
    SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
    SyS_setsockopt+0x76/0xa0 net/socket.c:1828
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x449099
    RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
    RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
    RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
    R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001

    Local variable description: ----req_u@packet_setsockopt
    Variable was created at:
    packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
    SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849

    Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Mar, 2018

1 commit


13 Feb, 2018

2 commits

  • These pernet_operations just create and destroy /proc entry,
    and another operations do not touch it.

    Also, nobody else are interested in foreign net::packet::sklist.

    Signed-off-by: Kirill Tkhai
    Acked-by: Andrei Vagin
    Signed-off-by: David S. Miller

    Kirill Tkhai
     
  • Changes since v1:
    Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

    Before:
    All these functions either return a negative error indicator,
    or store length of sockaddr into "int *socklen" parameter
    and return zero on success.

    "int *socklen" parameter is awkward. For example, if caller does not
    care, it still needs to provide on-stack storage for the value
    it does not need.

    None of the many FOO_getname() functions of various protocols
    ever used old value of *socklen. They always just overwrite it.

    This change drops this parameter, and makes all these functions, on success,
    return length of sockaddr. It's always >= 0 and can be differentiated
    from an error.

    Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

    rpc_sockname() lost "int buflen" parameter, since its only use was
    to be passed to kernel_getsockname() as &buflen and subsequently
    not used in any way.

    Userspace API is not changed.

    text data bss dec hex filename
    30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
    30108109 2633612 873672 33615393 200ee21 vmlinux.o

    Signed-off-by: Denys Vlasenko
    CC: David S. Miller
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-bluetooth@vger.kernel.org
    CC: linux-decnet-user@lists.sourceforge.net
    CC: linux-wireless@vger.kernel.org
    CC: linux-rdma@vger.kernel.org
    CC: linux-sctp@vger.kernel.org
    CC: linux-nfs@vger.kernel.org
    CC: linux-x25@vger.kernel.org
    Signed-off-by: David S. Miller

    Denys Vlasenko