05 Aug, 2020

6 commits

  • [ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]

    In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    it would take 'priority' to make a policy unique, and allow duplicated
    policies with different 'priority' to be added, which is not expected
    by userland, as Tobias reported in strongswan.

    To fix this duplicated policies issue, and also fix the issue in
    commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    when doing add/del/get/update on user interfaces, this patch is to change
    to look up a policy with both mark and mask by doing:

    mark.v == pol->mark.v && mark.m == pol->mark.m

    and leave the check:

    (mark & pol->mark.m) == pol->mark.v

    for tx/rx path only.

    As the userland expects an exact mark and mask match to manage policies.

    v1->v2:
    - make xfrm_policy_mark_match inline and fix the changelog as
    Tobias suggested.

    Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
    Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
    Reported-by: Tobias Brunner
    Tested-by: Tobias Brunner
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xin Long
     
  • commit 8999dc89497ab1c80d0718828e838c7cd5f6bffe upstream.

    We should check null before do x25_neigh_put in x25_disconnect,
    otherwise may cause null-ptr-deref like this:

    #include
    #include

    int main() {
    int sck_x25;
    sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0);
    close(sck_x25);
    return 0;
    }

    BUG: kernel NULL pointer dereference, address: 00000000000000d8
    CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-
    RIP: 0010:x25_disconnect+0x91/0xe0
    Call Trace:
    x25_release+0x18a/0x1b0
    __sock_release+0x3d/0xc0
    sock_close+0x13/0x20
    __fput+0x107/0x270
    ____fput+0x9/0x10
    task_work_run+0x6d/0xb0
    exit_to_usermode_loop+0x102/0x110
    do_syscall_64+0x23c/0x260
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    Reported-by: syzbot+6db548b615e5aeefdce2@syzkaller.appspotmail.com
    Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    YueHaibing
     
  • commit 4becb7ee5b3d2829ed7b9261a245a77d5b7de902 upstream.

    x25_connect() invokes x25_get_neigh(), which returns a reference of the
    specified x25_neigh object to "x25->neighbour" with increased refcnt.

    When x25 connect success and returns, the reference still be hold by
    "x25->neighbour", so the refcount should be decreased in
    x25_disconnect() to keep refcount balanced.

    The reference counting issue happens in x25_disconnect(), which forgets
    to decrease the refcnt increased by x25_get_neigh() in x25_connect(),
    causing a refcnt leak.

    Fix this issue by calling x25_neigh_put() before x25_disconnect()
    returns.

    Signed-off-by: Xiyu Yang
    Signed-off-by: Xin Tan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xiyu Yang
     
  • commit bbc8a99e952226c585ac17477a85ef1194501762 upstream.

    rds_notify_queue_get() is potentially copying uninitialized kernel stack
    memory to userspace since the compiler may leave a 4-byte hole at the end
    of `cmsg`.

    In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which
    unfortunately does not always initialize that 4-byte hole. Fix it by using
    memset() instead.

    Cc: stable@vger.kernel.org
    Fixes: f037590fff30 ("rds: fix a leak of kernel memory")
    Fixes: bdbe6fbc6a2f ("RDS: recv.c")
    Suggested-by: Dan Carpenter
    Signed-off-by: Peilin Ye
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Peilin Ye
     
  • commit 74d6a5d5662975aed7f25952f62efbb6f6dadd29 upstream.

    p9_read_work and p9_fd_cancelled may be called concurrently.
    In some cases, req->req_list may be deleted by both p9_read_work
    and p9_fd_cancelled.

    We can fix it by ignoring replies associated with a cancelled
    request and ignoring cancelled request if message has been received
    before lock.

    Link: http://lkml.kernel.org/r/20200612090833.36149-1-wanghai38@huawei.com
    Fixes: 60ff779c4abb ("9p: client: remove unused code and any reference to "cancelled" function")
    Cc: # v3.12+
    Reported-by: syzbot+77a25acfa0382e06ab23@syzkaller.appspotmail.com
    Signed-off-by: Wang Hai
    Signed-off-by: Dominique Martinet
    Signed-off-by: Greg Kroah-Hartman

    Wang Hai
     
  • [ Upstream commit f45db2b909c7e76f35850e78f017221f30282b8e ]

    The domain table should be empty at module unload. If it isn't there is
    a bug somewhere. So check and report.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Sasha Levin

    Sasha Levin
     

01 Aug, 2020

15 commits

  • [ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]

    Currently, SO_REUSEPORT does not work well if connected sockets are in a
    UDP reuseport group.

    Then reuseport_has_conns() returns true and the result of
    reuseport_select_sock() is discarded. Also, unconnected sockets have the
    same score, hence only does the first unconnected socket in udp_hslot
    always receive all packets sent to unconnected sockets.

    So, the result of reuseport_select_sock() should be used for load
    balancing.

    The noteworthy point is that the unconnected sockets placed after
    connected sockets in sock_reuseport.socks will receive more packets than
    others because of the algorithm in reuseport_select_sock().

    index | connected | reciprocal_scale | result
    ---------------------------------------------
    0 | no | 20% | 40%
    1 | no | 20% | 20%
    2 | yes | 20% | 0%
    3 | no | 20% | 40%
    4 | yes | 20% | 0%

    If most of the sockets are connected, this can be a problem, but it still
    works better than now.

    Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
    CC: Willem de Bruijn
    Reviewed-by: Benjamin Herrenschmidt
    Signed-off-by: Kuniyuki Iwashima
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Kuniyuki Iwashima
     
  • [ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]

    If an unconnected socket in a UDP reuseport group connect()s, has_conns is
    set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
    sockets in udp_hslot looking for the connected socket with the highest
    score.

    However, when the number of sockets bound to the port exceeds max_socks,
    reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
    to return without scanning all sockets, resulting in that packets sent to
    connected sockets may be distributed to unconnected sockets.

    Therefore, reuseport_grow() should copy has_conns.

    Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
    CC: Willem de Bruijn
    Reviewed-by: Benjamin Herrenschmidt
    Signed-off-by: Kuniyuki Iwashima
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Kuniyuki Iwashima
     
  • [ Upstream commit 3ecdda3e9ad837cf9cb41b6faa11b1af3a5abc0c ]

    When adding a stream with stream reconf, the new stream firstly is in
    CLOSED state but new out chunks can still be enqueued. Then once gets
    the confirmation from the peer, the state will change to OPEN.

    However, if the peer denies, it needs to roll back the stream. But when
    doing that, it only sets the stream outcnt back, and the chunks already
    in the new stream don't get purged. It caused these chunks can still be
    dequeued in sctp_outq_dequeue_data().

    As its stream is still in CLOSE, the chunk will be enqueued to the head
    again by sctp_outq_head_data(). This chunk will never be sent out, and
    the chunks after it can never be dequeued. The assoc will be 'hung' in
    a dead loop of sending this chunk.

    To fix it, this patch is to purge these chunks already in the new
    stream by calling sctp_stream_shrink_out() when failing to do the
    addstream reconf.

    Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
    Reported-by: Ying Xu
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 8f13399db22f909a35735bf8ae2f932e0c8f0e30 ]

    It's not necessary to go list_for_each for outq->out_chunk_list
    when new outcnt >= old outcnt, as no chunk with higher sid than
    new (outcnt - 1) exists in the outqueue.

    While at it, also move the list_for_each code in a new function
    sctp_stream_shrink_out(), which will be used in the next patch.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 17ad73e941b71f3bec7523ea4e9cbc3752461c2d ]

    We recently added some bounds checking in ax25_connect() and
    ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because
    they were no longer required.

    Unfortunately, I believe they are required to prevent integer overflows
    so I have added them back.

    Fixes: 8885bb0621f0 ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()")
    Fixes: 2f2a7ffad5c6 ("AX.25: Fix out-of-bounds read in ax25_connect()")
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ]

    Previously TLP may send multiple probes of new data in one
    flight. This happens when the sender is cwnd limited. After the
    initial TLP containing new data is sent, the sender receives another
    ACK that acks partial inflight. It may re-arm another TLP timer
    to send more, if no further ACK returns before the next TLP timeout
    (PTO) expires. The sender may send in theory a large amount of TLP
    until send queue is depleted. This only happens if the sender sees
    such irregular uncommon ACK pattern. But it is generally undesirable
    behavior during congestion especially.

    The original TLP design restrict only one TLP probe per inflight as
    published in "Reducing Web Latency: the Virtue of Gentle Aggression",
    SIGCOMM 2013. This patch changes TLP to send at most one probe
    per inflight.

    Note that if the sender is app-limited, TLP retransmits old data
    and did not have this issue.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yuchung Cheng
     
  • [ Upstream commit 639f181f0ee20d3249dbc55f740f0167267180f0 ]

    rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
    rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.

    Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
    as this particular error doesn't get stored in ->sk_err by the networking
    core.

    Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
    errors (there's no way for it to report which call, if any, the error was
    caused by).

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]

    When vlan_newlink call register_vlan_dev fails, it might return error
    with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
    free the memory. But currently rtnl_newlink only free the memory which
    state is NETREG_UNINITIALIZED.

    BUG: memory leak
    unreferenced object 0xffff8881051de000 (size 4096):
    comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
    hex dump (first 32 bytes):
    76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00 vlan2...........
    00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00 .E(.............
    backtrace:
    [] kmalloc_node include/linux/slab.h:578 [inline]
    [] kvmalloc_node+0x33/0xd0 mm/util.c:574
    [] kvmalloc include/linux/mm.h:753 [inline]
    [] kvzalloc include/linux/mm.h:761 [inline]
    [] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
    [] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
    [] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
    [] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
    [] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
    [] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    [] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
    [] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
    [] sock_sendmsg_nosec net/socket.c:652 [inline]
    [] sock_sendmsg+0x109/0x140 net/socket.c:672
    [] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
    [] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
    [] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
    [] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
    Reported-by: Hulk Robot
    Signed-off-by: Weilong Chen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Weilong Chen
     
  • [ Upstream commit af9f691f0f5bdd1ade65a7b84927639882d7c3e5 ]

    We have to detach sock from socket in qrtr_release(),
    otherwise skb->sk may still reference to this socket
    when the skb is released in tun->queue, particularly
    sk->sk_wq still points to &sock->wq, which leads to
    a UAF.

    Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com
    Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space")
    Cc: Bjorn Andersson
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit b0a422772fec29811e293c7c0e6f991c0fd9241d ]

    We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is
    checked.

    Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro")
    Signed-off-by: Miaohe Lin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Miaohe Lin
     
  • [ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

    When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
    add a newline for easy reading.

    root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
    0root@syzkaller:~#

    Signed-off-by: Xiongfeng Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xiongfeng Wang
     
  • [ Upstream commit 46ef5b89ec0ecf290d74c4aee844f063933c4da4 ]

    KASAN report null-ptr-deref error when register_netdev() failed:

    KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7]
    CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ #12
    Call Trace:
    ip6gre_init_net+0x4ab/0x580
    ? ip6gre_tunnel_uninit+0x3f0/0x3f0
    ops_init+0xa8/0x3c0
    setup_net+0x2de/0x7e0
    ? rcu_read_lock_bh_held+0xb0/0xb0
    ? ops_init+0x3c0/0x3c0
    ? kasan_unpoison_shadow+0x33/0x40
    ? __kasan_kmalloc.constprop.0+0xc2/0xd0
    copy_net_ns+0x27d/0x530
    create_new_namespaces+0x382/0xa30
    unshare_nsproxy_namespaces+0xa1/0x1d0
    ksys_unshare+0x39c/0x780
    ? walk_process_tree+0x2a0/0x2a0
    ? trace_hardirqs_on+0x4a/0x1b0
    ? _raw_spin_unlock_irq+0x1f/0x30
    ? syscall_trace_enter+0x1a7/0x330
    ? do_syscall_64+0x1c/0xa0
    __x64_sys_unshare+0x2d/0x40
    do_syscall_64+0x56/0xa0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later
    access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving
    'ign->fb_tunnel_dev' to local variable ndev.

    Fixes: dafabb6590cb ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()")
    Reported-by: Hulk Robot
    Signed-off-by: Wei Yongjun
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Yongjun
     
  • [ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

    IRQs are disabled when freeing skbs in input queue.
    Use the IRQ safe variant to free skbs here.

    Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Subash Abhinov Kasiviswanathan
     
  • [ Upstream commit 8885bb0621f01a6c82be60a91e5fc0f6e2f71186 ]

    Checks on `addr_len` and `usax->sax25_ndigis` are insufficient.
    ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7
    or 8. Fix it.

    It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since
    `addr_len` is guaranteed to be less than or equal to
    `sizeof(struct full_sockaddr_ax25)`

    Signed-off-by: Peilin Ye
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Peilin Ye
     
  • [ Upstream commit 2f2a7ffad5c6cbf3d438e813cfdc88230e185ba6 ]

    Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient.
    ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis`
    equals to 7 or 8. Fix it.

    This issue has been reported as a KMSAN uninit-value bug, because in such
    a case, ax25_connect() reaches into the uninitialized portion of the
    `struct sockaddr_storage` statically allocated in __sys_connect().

    It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because
    `addr_len` is guaranteed to be less than or equal to
    `sizeof(struct full_sockaddr_ax25)`.

    Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66
    Signed-off-by: Peilin Ye
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Peilin Ye
     

29 Jul, 2020

3 commits

  • [ Upstream commit 8210e344ccb798c672ab237b1a4f241bda08909b ]

    The sync_thread_backup only checks sk_receive_queue is empty or not,
    there is a situation which cannot sync the connection entries when
    sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf,
    the sync packets are dropped in __udp_enqueue_schedule_skb, this is
    because the packets in reader_queue is not read, so the rmem is
    not reclaimed.

    Here I add the check of whether the reader_queue of the udp sock is
    empty or not to solve this problem.

    Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
    Reported-by: zhouxudong
    Signed-off-by: guodeqing
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    guodeqing
     
  • [ Upstream commit f961134a612c793d5901a93d85a29337c74af978 ]

    Commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free
    on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock'
    pointer, but we forgot to annotate it.

    This patch adds the annotation to fix the following sparse errors:

    net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock *

    Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
    Reported-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Stefano Garzarella
     
  • [ Upstream commit 0b467b63870d9c05c81456aa9bfee894ab2db3b6 ]

    Without this patch, eapol frames cannot be received in mesh
    mode, when 802.1X should be used. Initially only a MGTK is
    defined, which is found and set as rx->key, when there are
    no other keys set. ieee80211_drop_unencrypted would then
    drop these eapol frames, as they are data frames without
    encryption and there exists some rx->key.

    Fix this by differentiating between mesh eapol frames and
    other data frames with existing rx->key. Allow mesh mesh
    eapol frames only if they are for our vif address.

    With this patch in-place, ieee80211_rx_h_mesh_fwding continues
    after the ieee80211_drop_unencrypted check and notices, that
    these eapol frames have to be delivered locally, as they should.

    Signed-off-by: Markus Theil
    Link: https://lore.kernel.org/r/20200625104214.50319-1-markus.theil@tu-ilmenau.de
    [small code cleanups]
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Markus Theil
     

22 Jul, 2020

16 commits

  • commit 2f3fead62144002557f322c2a7c15e1255df0653 upstream.

    Currently target_copy() is used only for sending linger pings, so
    this doesn't come up, but generally omitting recovery_deletes can
    result in unneeded resends (force_resend in calc_target()).

    Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • [ Upstream commit 912288442cb2f431bf3c8cb097a5de83bc6dbac1 ]

    Currently the header size calculations are using an assignment
    operator instead of a += operator when accumulating the header
    size leading to incorrect sizes. Fix this by using the correct
    operator.

    Addresses-Coverity: ("Unused value")
    Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow")
    Signed-off-by: Colin Ian King
    Reviewed-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin

    Colin Ian King
     
  • [ Upstream commit 0da7536fb47f51df89ccfcb1fa09f249d9accec5 ]

    When no full socket is available, skbs are sent over a per-netns
    control socket. Its sk_mark is temporarily adjusted to match that
    of the real (request or timewait) socket or to reflect an incoming
    skb, so that the outgoing skb inherits this in __ip_make_skb.

    Introduction of the socket cookie mark field broke this. Now the
    skb is set through the cookie and cork:

    # init sockc.mark from sk_mark or cmsg
    ip_append_data
    ip_setup_cork # convert sockc.mark to cork mark
    ip_push_pending_frames
    ip_finish_skb
    __ip_make_skb # set skb->mark to cork mark

    But I missed these special control sockets. Update all callers of
    __ip(6)_make_skb that were originally missed.

    For IPv6, the same two icmp(v6) paths are affected. The third
    case is not, as commit 92e55f412cff ("tcp: don't annotate
    mark on control socket from tcp_v6_send_response()") replaced
    the ctl_sk->sk_mark with passing the mark field directly as a
    function argument. That commit predates the commit that
    introduced the bug.

    Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg")
    Signed-off-by: Willem de Bruijn
    Reported-by: Martin KaFai Lau
    Reviewed-by: Martin KaFai Lau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

    When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
    copied, so the cgroup refcnt must be taken too. And, unlike the
    sk_alloc() path, sock_update_netprioidx() is not called here.
    Therefore, it is safe and necessary to grab the cgroup refcnt
    even when cgroup_sk_alloc is disabled.

    sk_clone_lock() is in BH context anyway, the in_interrupt()
    would terminate this function if called there. And for sk_alloc()
    skcd->val is always zero. So it's safe to factor out the code
    to make it more readable.

    The global variable 'cgroup_sk_alloc_disabled' is used to determine
    whether to take these reference counts. It is impossible to make
    the reference counting correct unless we save this bit of information
    in skcd->val. So, add a new bit there to record whether the socket
    has already taken the reference counts. This obviously relies on
    kmalloc() to align cgroup pointers to at least 4 bytes,
    ARCH_KMALLOC_MINALIGN is certainly larger than that.

    This bug seems to be introduced since the beginning, commit
    d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
    tried to fix it but not compeletely. It seems not easy to trigger until
    the recent commit 090e28b229af
    ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Reported-by: Cameron Berkenpas
    Reported-by: Peter Geis
    Reported-by: Lu Fengqi
    Reported-by: Daniël Sonck
    Reported-by: Zhang Qiang
    Tested-by: Cameron Berkenpas
    Tested-by: Peter Geis
    Tested-by: Thomas Lamprecht
    Cc: Daniel Borkmann
    Cc: Zefan Li
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 1ca0fafd73c5268e8fc4b997094b8bb2bfe8deea ]

    This essentially reverts commit 721230326891 ("tcp: md5: reject TCP_MD5SIG
    or TCP_MD5SIG_EXT on established sockets")

    Mathieu reported that many vendors BGP implementations can
    actually switch TCP MD5 on established flows.

    Quoting Mathieu :
    Here is a list of a few network vendors along with their behavior
    with respect to TCP MD5:

    - Cisco: Allows for password to be changed, but within the hold-down
    timer (~180 seconds).
    - Juniper: When password is initially set on active connection it will
    reset, but after that any subsequent password changes no network
    resets.
    - Nokia: No notes on if they flap the tcp connection or not.
    - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
    both sides are ok with new passwords.
    - Meta-Switch: Expects the password to be set before a connection is
    attempted, but no further info on whether they reset the TCP
    connection on a change.
    - Avaya: Disable the neighbor, then set password, then re-enable.
    - Zebos: Would normally allow the change when socket connected.

    We can revert my prior change because commit 9424e2e7ad93 ("tcp: md5: fix potential
    overestimation of TCP option space") removed the leak of 4 kernel bytes to
    the wire that was the main reason for my patch.

    While doing my investigations, I found a bug when a MD5 key is changed, leading
    to these commits that stable teams want to consider before backporting this revert :

    Commit 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
    Commit e6ced831ef11 ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers")

    Fixes: 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
    Signed-off-by: Eric Dumazet
    Reported-by: Mathieu Desnoyers
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit e6ced831ef11a2a06e8d00aad9d4fc05b610bf38 ]

    My prior fix went a bit too far, according to Herbert and Mathieu.

    Since we accept that concurrent TCP MD5 lookups might see inconsistent
    keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()

    Clearing all key->key[] is needed to avoid possible KMSAN reports,
    if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
    using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.

    data_race() was added in linux-5.8 and will prevent KCSAN reports,
    this can safely be removed in stable backports, if data_race() is
    not yet backported.

    v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()

    Fixes: 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
    Signed-off-by: Eric Dumazet
    Cc: Mathieu Desnoyers
    Cc: Herbert Xu
    Cc: Marco Elver
    Reviewed-by: Mathieu Desnoyers
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit e114e1e8ac9d31f25b9dd873bab5d80c1fc482ca ]

    Whenever cookie_init_timestamp() has been used to encode
    ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK.

    Otherwise, tcp_synack_options() will still advertize options like WSCALE
    that we can not deduce later when receiving the packet from the client
    to complete 3WHS.

    Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet,
    but we can not know for sure that all TCP stacks have the same logic.

    Before the fix a tcpdump would exhibit this wrong exchange :

    10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0
    10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0
    10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0
    10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12
    10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0

    After this patch the exchange looks saner :

    11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0
    11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0
    11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0
    11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12
    11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0
    11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12
    11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0

    Of course, no SACK block will ever be added later, but nothing should break.
    Technically, we could remove the 4 nops included in MD5+TS options,
    but again some stacks could break seeing not conventional alignment.

    Fixes: 4957faade11b ("TCPCT part 1g: Responder Cookie => Initiator")
    Signed-off-by: Eric Dumazet
    Cc: Florian Westphal
    Cc: Mathieu Desnoyers
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 6a2febec338df7e7699a52d00b2e1207dcf65b28 ]

    MD5 keys are read with RCU protection, and tcp_md5_do_add()
    might update in-place a prior key.

    Normally, typical RCU updates would allocate a new piece
    of memory. In this case only key->key and key->keylen might
    be updated, and we do not care if an incoming packet could
    see the old key, the new one, or some intermediate value,
    since changing the key on a live flow is known to be problematic
    anyway.

    We only want to make sure that in the case key->keylen
    is changed, cpus in tcp_md5_hash_key() wont try to use
    uninitialized data, or crash because key->keylen was
    read twice to feed sg_init_one() and ahash_request_set_crypt()

    Fixes: 9ea88a153001 ("tcp: md5: check md5 signature without socket lock")
    Signed-off-by: Eric Dumazet
    Cc: Mathieu Desnoyers
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit ce69e563b325f620863830c246a8698ccea52048 ]

    syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
    tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
    just copies all the memory, the allocated pointer will be copied as
    well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
    If now the socket will be destroyed before the congestion-control
    has properly been initialized (through a call to tcp_init_transfer), we
    will end up freeing memory that does not belong to that particular
    socket, opening the door to a double-free:

    [ 11.413102] ==================================================================
    [ 11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
    [ 11.415329]
    [ 11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
    [ 11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
    [ 11.418148] Call Trace:
    [ 11.418534]
    [ 11.418834] dump_stack+0x7d/0xb0
    [ 11.419297] print_address_description.constprop.0+0x1a/0x210
    [ 11.422079] kasan_report_invalid_free+0x51/0x80
    [ 11.423433] __kasan_slab_free+0x15e/0x170
    [ 11.424761] kfree+0x8c/0x230
    [ 11.425157] tcp_cleanup_congestion_control+0x58/0xd0
    [ 11.425872] tcp_v4_destroy_sock+0x57/0x5a0
    [ 11.426493] inet_csk_destroy_sock+0x153/0x2c0
    [ 11.427093] tcp_v4_syn_recv_sock+0xb29/0x1100
    [ 11.427731] tcp_get_cookie_sock+0xc3/0x4a0
    [ 11.429457] cookie_v4_check+0x13d0/0x2500
    [ 11.433189] tcp_v4_do_rcv+0x60e/0x780
    [ 11.433727] tcp_v4_rcv+0x2869/0x2e10
    [ 11.437143] ip_protocol_deliver_rcu+0x23/0x190
    [ 11.437810] ip_local_deliver+0x294/0x350
    [ 11.439566] __netif_receive_skb_one_core+0x15d/0x1a0
    [ 11.441995] process_backlog+0x1b1/0x6b0
    [ 11.443148] net_rx_action+0x37e/0xc40
    [ 11.445361] __do_softirq+0x18c/0x61a
    [ 11.445881] asm_call_on_stack+0x12/0x20
    [ 11.446409]
    [ 11.446716] do_softirq_own_stack+0x34/0x40
    [ 11.447259] do_softirq.part.0+0x26/0x30
    [ 11.447827] __local_bh_enable_ip+0x46/0x50
    [ 11.448406] ip_finish_output2+0x60f/0x1bc0
    [ 11.450109] __ip_queue_xmit+0x71c/0x1b60
    [ 11.451861] __tcp_transmit_skb+0x1727/0x3bb0
    [ 11.453789] tcp_rcv_state_process+0x3070/0x4d3a
    [ 11.456810] tcp_v4_do_rcv+0x2ad/0x780
    [ 11.457995] __release_sock+0x14b/0x2c0
    [ 11.458529] release_sock+0x4a/0x170
    [ 11.459005] __inet_stream_connect+0x467/0xc80
    [ 11.461435] inet_stream_connect+0x4e/0xa0
    [ 11.462043] __sys_connect+0x204/0x270
    [ 11.465515] __x64_sys_connect+0x6a/0xb0
    [ 11.466088] do_syscall_64+0x3e/0x70
    [ 11.466617] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 11.467341] RIP: 0033:0x7f56046dc469
    [ 11.467844] Code: Bad RIP value.
    [ 11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
    [ 11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
    [ 11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
    [ 11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
    [ 11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [ 11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
    [ 11.474321]
    [ 11.474527] Allocated by task 4884:
    [ 11.475031] save_stack+0x1b/0x40
    [ 11.475548] __kasan_kmalloc.constprop.0+0xc2/0xd0
    [ 11.476182] tcp_cdg_init+0xf0/0x150
    [ 11.476744] tcp_init_congestion_control+0x9b/0x3a0
    [ 11.477435] tcp_set_congestion_control+0x270/0x32f
    [ 11.478088] do_tcp_setsockopt.isra.0+0x521/0x1a00
    [ 11.478744] __sys_setsockopt+0xff/0x1e0
    [ 11.479259] __x64_sys_setsockopt+0xb5/0x150
    [ 11.479895] do_syscall_64+0x3e/0x70
    [ 11.480395] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 11.481097]
    [ 11.481321] Freed by task 4872:
    [ 11.481783] save_stack+0x1b/0x40
    [ 11.482230] __kasan_slab_free+0x12c/0x170
    [ 11.482839] kfree+0x8c/0x230
    [ 11.483240] tcp_cleanup_congestion_control+0x58/0xd0
    [ 11.483948] tcp_v4_destroy_sock+0x57/0x5a0
    [ 11.484502] inet_csk_destroy_sock+0x153/0x2c0
    [ 11.485144] tcp_close+0x932/0xfe0
    [ 11.485642] inet_release+0xc1/0x1c0
    [ 11.486131] __sock_release+0xc0/0x270
    [ 11.486697] sock_close+0xc/0x10
    [ 11.487145] __fput+0x277/0x780
    [ 11.487632] task_work_run+0xeb/0x180
    [ 11.488118] __prepare_exit_to_usermode+0x15a/0x160
    [ 11.488834] do_syscall_64+0x4a/0x70
    [ 11.489326] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Wei Wang fixed a part of these CDG-malloc issues with commit c12014440750
    ("tcp: memset ca_priv data to 0 properly").

    This patch here fixes the listener-scenario: We make sure that listeners
    setting the congestion-control through setsockopt won't initialize it
    (thus CDG never allocates on listeners). For those who use AF_UNSPEC to
    reuse a socket, tcp_disconnect() is changed to cleanup afterwards.

    (The issue can be reproduced at least down to v4.4.x.)

    Cc: Wei Wang
    Cc: Eric Dumazet
    Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
    Signed-off-by: Christoph Paasch
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Christoph Paasch
     
  • [ Upstream commit ba3bb0e76ccd464bb66665a1941fabe55dadb3ba ]

    Whenever tcp_try_rmem_schedule() returns an error, we are under
    trouble and should make sure to wakeup readers so that they
    can drain socket queues and eventually make room.

    Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

    There are a couple of places in net/sched/ that check skb->protocol and act
    on the value there. However, in the presence of VLAN tags, the value stored
    in skb->protocol can be inconsistent based on whether VLAN acceleration is
    enabled. The commit quoted in the Fixes tag below fixed the users of
    skb->protocol to use a helper that will always see the VLAN ethertype.

    However, most of the callers don't actually handle the VLAN ethertype, but
    expect to find the IP header type in the protocol field. This means that
    things like changing the ECN field, or parsing diffserv values, stops
    working if there's a VLAN tag, or if there are multiple nested VLAN
    tags (QinQ).

    To fix this, change the helper to take an argument that indicates whether
    the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
    make sure to skip all of them, so behaviour is consistent even in QinQ
    mode.

    To make the helper usable from the ECN code, move it to if_vlan.h instead
    of pkt_sched.h.

    v3:
    - Remove empty lines
    - Move vlan variable definitions inside loop in skb_protocol()
    - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
    bpf_skb_ecn_set_ce()

    v2:
    - Use eth_type_vlan() helper in skb_protocol()
    - Also fix code that reads skb->protocol directly
    - Change a couple of 'if/else if' statements to switch constructs to avoid
    calling the helper twice

    Reported-by: Ilya Ponetayev
    Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 306381aec7c2b5a658eebca008c8a1b666536cba ]

    When tcf_block_get() fails inside atm_tc_init(),
    atm_tc_put() is called to release the qdisc p->link.q.
    But the flow->ref prevents it to do so, as the flow->ref
    is still zero.

    Fix this by moving the p->link.ref initialization before
    tcf_block_get().

    Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
    Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit a9b1110162357689a34992d5c925852948e5b9fd ]

    syzbot was to trigger a bug by tricking AF_LLC with
    non sensible addr->sllc_arphrd

    It seems clear LLC requires an Ethernet device.

    Back in commit abf9d537fea2 ("llc: add support for SO_BINDTODEVICE")
    Octavian Purdila added possibility for application to use a zero
    value for sllc_arphrd, convert it to ARPHRD_ETHER to not cause
    regressions on existing applications.

    BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:199 [inline]
    BUG: KASAN: use-after-free in list_empty include/linux/list.h:268 [inline]
    BUG: KASAN: use-after-free in waitqueue_active include/linux/wait.h:126 [inline]
    BUG: KASAN: use-after-free in wq_has_sleeper include/linux/wait.h:160 [inline]
    BUG: KASAN: use-after-free in skwq_has_sleeper include/net/sock.h:2092 [inline]
    BUG: KASAN: use-after-free in sock_def_write_space+0x642/0x670 net/core/sock.c:2813
    Read of size 8 at addr ffff88801e0b4078 by task ksoftirqd/3/27

    CPU: 3 PID: 27 Comm: ksoftirqd/3 Not tainted 5.5.0-rc1-syzkaller #0
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x197/0x210 lib/dump_stack.c:118
    print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
    __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
    kasan_report+0x12/0x20 mm/kasan/common.c:639
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
    __read_once_size include/linux/compiler.h:199 [inline]
    list_empty include/linux/list.h:268 [inline]
    waitqueue_active include/linux/wait.h:126 [inline]
    wq_has_sleeper include/linux/wait.h:160 [inline]
    skwq_has_sleeper include/net/sock.h:2092 [inline]
    sock_def_write_space+0x642/0x670 net/core/sock.c:2813
    sock_wfree+0x1e1/0x260 net/core/sock.c:1958
    skb_release_head_state+0xeb/0x260 net/core/skbuff.c:652
    skb_release_all+0x16/0x60 net/core/skbuff.c:663
    __kfree_skb net/core/skbuff.c:679 [inline]
    consume_skb net/core/skbuff.c:838 [inline]
    consume_skb+0xfb/0x410 net/core/skbuff.c:832
    __dev_kfree_skb_any+0xa4/0xd0 net/core/dev.c:2967
    dev_kfree_skb_any include/linux/netdevice.h:3650 [inline]
    e1000_unmap_and_free_tx_resource.isra.0+0x21b/0x3a0 drivers/net/ethernet/intel/e1000/e1000_main.c:1963
    e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3854 [inline]
    e1000_clean+0x4cc/0x1d10 drivers/net/ethernet/intel/e1000/e1000_main.c:3796
    napi_poll net/core/dev.c:6532 [inline]
    net_rx_action+0x508/0x1120 net/core/dev.c:6600
    __do_softirq+0x262/0x98c kernel/softirq.c:292
    run_ksoftirqd kernel/softirq.c:603 [inline]
    run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
    smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
    kthread+0x361/0x430 kernel/kthread.c:255
    ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

    Allocated by task 8247:
    save_stack+0x23/0x90 mm/kasan/common.c:72
    set_track mm/kasan/common.c:80 [inline]
    __kasan_kmalloc mm/kasan/common.c:513 [inline]
    __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
    kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
    slab_post_alloc_hook mm/slab.h:584 [inline]
    slab_alloc mm/slab.c:3320 [inline]
    kmem_cache_alloc+0x121/0x710 mm/slab.c:3484
    sock_alloc_inode+0x1c/0x1d0 net/socket.c:240
    alloc_inode+0x68/0x1e0 fs/inode.c:230
    new_inode_pseudo+0x19/0xf0 fs/inode.c:919
    sock_alloc+0x41/0x270 net/socket.c:560
    __sock_create+0xc2/0x730 net/socket.c:1384
    sock_create net/socket.c:1471 [inline]
    __sys_socket+0x103/0x220 net/socket.c:1513
    __do_sys_socket net/socket.c:1522 [inline]
    __se_sys_socket net/socket.c:1520 [inline]
    __ia32_sys_socket+0x73/0xb0 net/socket.c:1520
    do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline]
    do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408
    entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

    Freed by task 17:
    save_stack+0x23/0x90 mm/kasan/common.c:72
    set_track mm/kasan/common.c:80 [inline]
    kasan_set_free_info mm/kasan/common.c:335 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
    kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
    __cache_free mm/slab.c:3426 [inline]
    kmem_cache_free+0x86/0x320 mm/slab.c:3694
    sock_free_inode+0x20/0x30 net/socket.c:261
    i_callback+0x44/0x80 fs/inode.c:219
    __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
    rcu_do_batch kernel/rcu/tree.c:2183 [inline]
    rcu_core+0x570/0x1540 kernel/rcu/tree.c:2408
    rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2417
    __do_softirq+0x262/0x98c kernel/softirq.c:292

    The buggy address belongs to the object at ffff88801e0b4000
    which belongs to the cache sock_inode_cache of size 1152
    The buggy address is located 120 bytes inside of
    1152-byte region [ffff88801e0b4000, ffff88801e0b4480)
    The buggy address belongs to the page:
    page:ffffea0000782d00 refcount:1 mapcount:0 mapping:ffff88807aa59c40 index:0xffff88801e0b4ffd
    raw: 00fffe0000000200 ffffea00008e6c88 ffffea0000782d48 ffff88807aa59c40
    raw: ffff88801e0b4ffd ffff88801e0b4000 0000000100000003 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff88801e0b3f00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    ffff88801e0b3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >ffff88801e0b4000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff88801e0b4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff88801e0b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

    Fixes: abf9d537fea2 ("llc: add support for SO_BINDTODEVICE")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 27d53323664c549b5bb2dfaaf6f7ad6e0376a64e ]

    In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
    skb's dst. However, it will eventually call inet6_csk_xmit() or
    ip_queue_xmit() where skb's dst will be overwritten by:

    skb_dst_set_noref(skb, dst);

    without releasing the old dst in skb. Then it causes dst/dev refcnt leak:

    unregister_netdevice: waiting for eth0 to become free. Usage count = 1

    This can be reproduced by simply running:

    # modprobe l2tp_eth && modprobe l2tp_ip
    # sh ./tools/testing/selftests/net/l2tp.sh

    So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
    should be dropped. This patch is to fix it by removing skb_dst_set()
    from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().

    Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
    Reported-by: Hangbin Liu
    Signed-off-by: Xin Long
    Acked-by: James Chapman
    Tested-by: James Chapman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit aea23c323d89836bcdcee67e49def997ffca043b ]

    Thomas reported a regression with IPv6 and anycast using the following
    reproducer:

    echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
    ip -6 a add fc12::1/16 dev lo
    sleep 2
    echo "pinging lo"
    ping6 -c 2 fc12::

    The conversion of addrconf_f6i_alloc to use ip6_route_info_create missed
    the use of fib6_is_reject which checks addresses added to the loopback
    interface and sets the REJECT flag as needed. Update fib6_is_reject for
    loopback checks to handle RTF_ANYCAST addresses.

    Fixes: c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
    Reported-by: thomas.gambier@nexedi.com
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 34fe5a1cf95c3f114068fc16d919c9cf4b00e428 ]

    Brian reported a crash in IPv6 code when using rpfilter with a setup
    running FRR and external nexthop objects. The root cause of the crash
    is fib6_select_path setting fib6_nh in the result to NULL because of
    an improper check for nexthop objects.

    More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif
    set causing fib6_select_path to be called with have_oif_match set.
    fib6_select_path has early check on have_oif_match and jumps to the
    out label which presumes a builtin fib6_nh. This path is invalid for
    nexthop objects; for external nexthops fib6_select_path needs to just
    return if the fib6_nh has already been set in the result otherwise it
    returns after the call to nexthop_path_fib6_result. Update the check
    on have_oif_match to not bail on external nexthops.

    Update selftests for this problem.

    Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info")
    Reported-by: Brian Rak
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern