Eric Lee / smarc-fsl-linux-kernel

05 Aug, 2020

6 commits

0307da686 xfrm: policy: match with both mark and mask on user interfaces ... Browse Code »

[ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]

In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
it would take 'priority' to make a policy unique, and allow duplicated
policies with different 'priority' to be added, which is not expected
by userland, as Tobias reported in strongswan.

To fix this duplicated policies issue, and also fix the issue in
commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
when doing add/del/get/update on user interfaces, this patch is to change
to look up a policy with both mark and mask by doing:

mark.v == pol->mark.v && mark.m == pol->mark.m

and leave the check:

(mark & pol->mark.m) == pol->mark.v

for tx/rx path only.

As the userland expects an exact mark and mask match to manage policies.

v1->v2:
- make xfrm_policy_mark_match inline and fix the changelog as
Tobias suggested.

Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
Reported-by: Tobias Brunner
Tested-by: Tobias Brunner
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin

Xin Long
2020-08-05 15:59:44 +0800
bbb13adb0 net/x25: Fix null-ptr-deref in x25_disconnect ... Browse Code »

commit 8999dc89497ab1c80d0718828e838c7cd5f6bffe upstream.

We should check null before do x25_neigh_put in x25_disconnect,
otherwise may cause null-ptr-deref like this:

#include
#include

int main() {
int sck_x25;
sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0);
close(sck_x25);
return 0;
}

BUG: kernel NULL pointer dereference, address: 00000000000000d8
CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-
RIP: 0010:x25_disconnect+0x91/0xe0
Call Trace:
x25_release+0x18a/0x1b0
__sock_release+0x3d/0xc0
sock_close+0x13/0x20
__fput+0x107/0x270
____fput+0x9/0x10
task_work_run+0x6d/0xb0
exit_to_usermode_loop+0x102/0x110
do_syscall_64+0x23c/0x260
entry_SYSCALL_64_after_hwframe+0x49/0xb3

Reported-by: syzbot+6db548b615e5aeefdce2@syzkaller.appspotmail.com
Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: YueHaibing
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

YueHaibing
2020-08-05 15:59:44 +0800
69cd304cf net/x25: Fix x25_neigh refcnt leak when x25 disconnect ... Browse Code »

commit 4becb7ee5b3d2829ed7b9261a245a77d5b7de902 upstream.

x25_connect() invokes x25_get_neigh(), which returns a reference of the
specified x25_neigh object to "x25->neighbour" with increased refcnt.

When x25 connect success and returns, the reference still be hold by
"x25->neighbour", so the refcount should be decreased in
x25_disconnect() to keep refcount balanced.

The reference counting issue happens in x25_disconnect(), which forgets
to decrease the refcnt increased by x25_get_neigh() in x25_connect(),
causing a refcnt leak.

Fix this issue by calling x25_neigh_put() before x25_disconnect()
returns.

Signed-off-by: Xiyu Yang
Signed-off-by: Xin Tan
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xiyu Yang
2020-08-05 15:59:44 +0800
2ec69499b rds: Prevent kernel-infoleak in rds_notify_queue_get() ... Browse Code »

commit bbc8a99e952226c585ac17477a85ef1194501762 upstream.

rds_notify_queue_get() is potentially copying uninitialized kernel stack
memory to userspace since the compiler may leave a 4-byte hole at the end
of `cmsg`.

In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which
unfortunately does not always initialize that 4-byte hole. Fix it by using
memset() instead.

Cc: stable@vger.kernel.org
Fixes: f037590fff30 ("rds: fix a leak of kernel memory")
Fixes: bdbe6fbc6a2f ("RDS: recv.c")
Suggested-by: Dan Carpenter
Signed-off-by: Peilin Ye
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Peilin Ye
2020-08-05 15:59:44 +0800
d3472f74d 9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work ... Browse Code »

commit 74d6a5d5662975aed7f25952f62efbb6f6dadd29 upstream.

p9_read_work and p9_fd_cancelled may be called concurrently.
In some cases, req->req_list may be deleted by both p9_read_work
and p9_fd_cancelled.

We can fix it by ignoring replies associated with a cancelled
request and ignoring cancelled request if message has been received
before lock.

Link: http://lkml.kernel.org/r/20200612090833.36149-1-wanghai38@huawei.com
Fixes: 60ff779c4abb ("9p: client: remove unused code and any reference to "cancelled" function")
Cc: # v3.12+
Reported-by: syzbot+77a25acfa0382e06ab23@syzkaller.appspotmail.com
Signed-off-by: Wang Hai
Signed-off-by: Dominique Martinet
Signed-off-by: Greg Kroah-Hartman

Wang Hai
2020-08-05 15:59:42 +0800
98cef10fb sunrpc: check that domain table is empty at module unload. ... Browse Code »

[ Upstream commit f45db2b909c7e76f35850e78f017221f30282b8e ]

The domain table should be empty at module unload. If it isn't there is
a bug somewhere. So check and report.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651
Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Sasha Levin

Sasha Levin
2020-08-05 15:59:41 +0800

01 Aug, 2020

15 commits

df89c1ee0 udp: Improve load balancing for SO_REUSEPORT. ... Browse Code »

[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]

Currently, SO_REUSEPORT does not work well if connected sockets are in a
UDP reuseport group.

Then reuseport_has_conns() returns true and the result of
reuseport_select_sock() is discarded. Also, unconnected sockets have the
same score, hence only does the first unconnected socket in udp_hslot
always receive all packets sent to unconnected sockets.

So, the result of reuseport_select_sock() should be used for load
balancing.

The noteworthy point is that the unconnected sockets placed after
connected sockets in sock_reuseport.socks will receive more packets than
others because of the algorithm in reuseport_select_sock().

index | connected | reciprocal_scale | result
---------------------------------------------
0 | no | 20% | 40%
1 | no | 20% | 20%
2 | yes | 20% | 0%
3 | no | 20% | 40%
4 | yes | 20% | 0%

If most of the sockets are connected, this can be a problem, but it still
works better than now.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn
Reviewed-by: Benjamin Herrenschmidt
Signed-off-by: Kuniyuki Iwashima
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Kuniyuki Iwashima
2020-08-01 00:39:31 +0800
6735c126d udp: Copy has_conns in reuseport_grow(). ... Browse Code »

[ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]

If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.

However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.

Therefore, reuseport_grow() should copy has_conns.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn
Reviewed-by: Benjamin Herrenschmidt
Signed-off-by: Kuniyuki Iwashima
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Kuniyuki Iwashima
2020-08-01 00:39:31 +0800
86512c693 sctp: shrink stream outq when fails to do addstream reconf ... Browse Code »

[ Upstream commit 3ecdda3e9ad837cf9cb41b6faa11b1af3a5abc0c ]

When adding a stream with stream reconf, the new stream firstly is in
CLOSED state but new out chunks can still be enqueued. Then once gets
the confirmation from the peer, the state will change to OPEN.

However, if the peer denies, it needs to roll back the stream. But when
doing that, it only sets the stream outcnt back, and the chunks already
in the new stream don't get purged. It caused these chunks can still be
dequeued in sctp_outq_dequeue_data().

As its stream is still in CLOSE, the chunk will be enqueued to the head
again by sctp_outq_head_data(). This chunk will never be sent out, and
the chunks after it can never be dequeued. The assoc will be 'hung' in
a dead loop of sending this chunk.

To fix it, this patch is to purge these chunks already in the new
stream by calling sctp_stream_shrink_out() when failing to do the
addstream reconf.

Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
Reported-by: Ying Xu
Signed-off-by: Xin Long
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2020-08-01 00:39:31 +0800
46e7c7efc sctp: shrink stream outq only when new outcnt < old outcnt ... Browse Code »

[ Upstream commit 8f13399db22f909a35735bf8ae2f932e0c8f0e30 ]

It's not necessary to go list_for_each for outq->out_chunk_list
when new outcnt >= old outcnt, as no chunk with higher sid than
new (outcnt - 1) exists in the outqueue.

While at it, also move the list_for_each code in a new function
sctp_stream_shrink_out(), which will be used in the next patch.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2020-08-01 00:39:31 +0800
bbf6af4a9 AX.25: Prevent integer overflows in connect and sendmsg ... Browse Code »

[ Upstream commit 17ad73e941b71f3bec7523ea4e9cbc3752461c2d ]

We recently added some bounds checking in ax25_connect() and
ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because
they were no longer required.

Unfortunately, I believe they are required to prevent integer overflows
so I have added them back.

Fixes: 8885bb0621f0 ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()")
Fixes: 2f2a7ffad5c6 ("AX.25: Fix out-of-bounds read in ax25_connect()")
Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2020-08-01 00:39:31 +0800
182ffc664 tcp: allow at most one TLP probe per flight ... Browse Code »

[ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ]

Previously TLP may send multiple probes of new data in one
flight. This happens when the sender is cwnd limited. After the
initial TLP containing new data is sent, the sender receives another
ACK that acks partial inflight. It may re-arm another TLP timer
to send more, if no further ACK returns before the next TLP timeout
(PTO) expires. The sender may send in theory a large amount of TLP
until send queue is depleted. This only happens if the sender sees
such irregular uncommon ACK pattern. But it is generally undesirable
behavior during congestion especially.

The original TLP design restrict only one TLP probe per inflight as
published in "Reducing Web Latency: the Virtue of Gentle Aggression",
SIGCOMM 2013. This patch changes TLP to send at most one probe
per inflight.

Note that if the sender is app-limited, TLP retransmits old data
and did not have this issue.

Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Yuchung Cheng
2020-08-01 00:39:31 +0800
e2f904fd7 rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA ... Browse Code »

[ Upstream commit 639f181f0ee20d3249dbc55f740f0167267180f0 ]

rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.

Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
as this particular error doesn't get stored in ->sk_err by the networking
core.

Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
errors (there's no way for it to report which call, if any, the error was
caused by).

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Howells
2020-08-01 00:39:31 +0800
01c928350 rtnetlink: Fix memory(net_device) leak when ->newlink fails ... Browse Code »

[ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]

When vlan_newlink call register_vlan_dev fails, it might return error
with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.

BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
hex dump (first 32 bytes):
76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00 vlan2...........
00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00 .E(.............
backtrace:
[] kmalloc_node include/linux/slab.h:578 [inline]
[] kvmalloc_node+0x33/0xd0 mm/util.c:574
[] kvmalloc include/linux/mm.h:753 [inline]
[] kvzalloc include/linux/mm.h:761 [inline]
[] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
[] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
[] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
[] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
[] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
[] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
[] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
[] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
[] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
[] sock_sendmsg_nosec net/socket.c:652 [inline]
[] sock_sendmsg+0x109/0x140 net/socket.c:672
[] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
[] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
[] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
[] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot
Signed-off-by: Weilong Chen
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Weilong Chen
2020-08-01 00:39:30 +0800
b7d3d6df7 qrtr: orphan socket in qrtr_release() ... Browse Code »

[ Upstream commit af9f691f0f5bdd1ade65a7b84927639882d7c3e5 ]

We have to detach sock from socket in qrtr_release(),
otherwise skb->sk may still reference to this socket
when the skb is released in tun->queue, particularly
sk->sk_wq still points to &sock->wq, which leads to
a UAF.

Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com
Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space")
Cc: Bjorn Andersson
Cc: Eric Dumazet
Signed-off-by: Cong Wang
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2020-08-01 00:39:30 +0800
2bf797a86 net: udp: Fix wrong clean up for IS_UDPLITE macro ... Browse Code »

[ Upstream commit b0a422772fec29811e293c7c0e6f991c0fd9241d ]

We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is
checked.

Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro")
Signed-off-by: Miaohe Lin
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Miaohe Lin
2020-08-01 00:39:30 +0800
274b40b6d net-sysfs: add a newline when printing 'tx_timeout' by sysfs ... Browse Code »

[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xiongfeng Wang
2020-08-01 00:39:30 +0800
8d9f13dd4 ip6_gre: fix null-ptr-deref in ip6gre_init_net() ... Browse Code »

[ Upstream commit 46ef5b89ec0ecf290d74c4aee844f063933c4da4 ]

KASAN report null-ptr-deref error when register_netdev() failed:

KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7]
CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ #12
Call Trace:
ip6gre_init_net+0x4ab/0x580
? ip6gre_tunnel_uninit+0x3f0/0x3f0
ops_init+0xa8/0x3c0
setup_net+0x2de/0x7e0
? rcu_read_lock_bh_held+0xb0/0xb0
? ops_init+0x3c0/0x3c0
? kasan_unpoison_shadow+0x33/0x40
? __kasan_kmalloc.constprop.0+0xc2/0xd0
copy_net_ns+0x27d/0x530
create_new_namespaces+0x382/0xa30
unshare_nsproxy_namespaces+0xa1/0x1d0
ksys_unshare+0x39c/0x780
? walk_process_tree+0x2a0/0x2a0
? trace_hardirqs_on+0x4a/0x1b0
? _raw_spin_unlock_irq+0x1f/0x30
? syscall_trace_enter+0x1a7/0x330
? do_syscall_64+0x1c/0xa0
__x64_sys_unshare+0x2d/0x40
do_syscall_64+0x56/0xa0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later
access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving
'ign->fb_tunnel_dev' to local variable ndev.

Fixes: dafabb6590cb ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()")
Reported-by: Hulk Robot
Signed-off-by: Wei Yongjun
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Wei Yongjun
2020-08-01 00:39:30 +0800
d109acd58 dev: Defer free of skbs in flush_backlog ... Browse Code »

[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Subash Abhinov Kasiviswanathan
2020-08-01 00:39:30 +0800
52aeeec1a AX.25: Prevent out-of-bounds read in ax25_sendmsg() ... Browse Code »

[ Upstream commit 8885bb0621f01a6c82be60a91e5fc0f6e2f71186 ]

Checks on `addr_len` and `usax->sax25_ndigis` are insufficient.
ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7
or 8. Fix it.

It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since
`addr_len` is guaranteed to be less than or equal to
`sizeof(struct full_sockaddr_ax25)`

Signed-off-by: Peilin Ye
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Peilin Ye
2020-08-01 00:39:29 +0800
2f1624faf AX.25: Fix out-of-bounds read in ax25_connect() ... Browse Code »

[ Upstream commit 2f2a7ffad5c6cbf3d438e813cfdc88230e185ba6 ]

Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient.
ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis`
equals to 7 or 8. Fix it.

This issue has been reported as a KMSAN uninit-value bug, because in such
a case, ax25_connect() reaches into the uninitialized portion of the
`struct sockaddr_storage` statically allocated in __sys_connect().

It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because
`addr_len` is guaranteed to be less than or equal to
`sizeof(struct full_sockaddr_ax25)`.

Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af66
Signed-off-by: Peilin Ye
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Peilin Ye
2020-08-01 00:39:29 +0800

29 Jul, 2020

3 commits

eb2c32de1 ipvs: fix the connection sync failed in some cases ... Browse Code »

[ Upstream commit 8210e344ccb798c672ab237b1a4f241bda08909b ]

The sync_thread_backup only checks sk_receive_queue is empty or not,
there is a situation which cannot sync the connection entries when
sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf,
the sync packets are dropped in __udp_enqueue_schedule_skb, this is
because the packets in reader_queue is not read, so the rmem is
not reclaimed.

Here I add the check of whether the reader_queue of the udp sock is
empty or not to solve this problem.

Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception")
Reported-by: zhouxudong
Signed-off-by: guodeqing
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin

guodeqing
2020-07-29 16:18:34 +0800
ad49d7666 vsock/virtio: annotate 'the_virtio_vsock' RCU pointer ... Browse Code »

[ Upstream commit f961134a612c793d5901a93d85a29337c74af978 ]

Commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free
on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock'
pointer, but we forgot to annotate it.

This patch adds the annotation to fix the following sparse errors:

net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock *
net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in comparison expression (different address spaces):
net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock [noderef] __rcu *
net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock *

Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
Reported-by: Michael S. Tsirkin
Signed-off-by: Stefano Garzarella
Reviewed-by: Stefan Hajnoczi
Acked-by: Michael S. Tsirkin
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin

Stefano Garzarella
2020-07-29 16:18:31 +0800
65c835ebe mac80211: allow rx of mesh eapol frames with default rx key ... Browse Code »

[ Upstream commit 0b467b63870d9c05c81456aa9bfee894ab2db3b6 ]

Without this patch, eapol frames cannot be received in mesh
mode, when 802.1X should be used. Initially only a MGTK is
defined, which is found and set as rx->key, when there are
no other keys set. ieee80211_drop_unencrypted would then
drop these eapol frames, as they are data frames without
encryption and there exists some rx->key.

Fix this by differentiating between mesh eapol frames and
other data frames with existing rx->key. Allow mesh mesh
eapol frames only if they are for our vif address.

With this patch in-place, ieee80211_rx_h_mesh_fwding continues
after the ieee80211_drop_unencrypted check and notices, that
these eapol frames have to be delivered locally, as they should.

Signed-off-by: Markus Theil
Link: https://lore.kernel.org/r/20200625104214.50319-1-markus.theil@tu-ilmenau.de
[small code cleanups]
Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin

Markus Theil
2020-07-29 16:18:26 +0800

22 Jul, 2020

16 commits

07253d24c libceph: don't omit recovery_deletes in target_copy() ... Browse Code »

commit 2f3fead62144002557f322c2a7c15e1255df0653 upstream.

Currently target_copy() is used only for sending linger pings, so
this doesn't come up, but generally omitting recovery_deletes can
result in unneeded resends (force_resend in calc_target()).

Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval")
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2020-07-22 15:33:17 +0800
c8a4452da xprtrdma: fix incorrect header size calculations ... Browse Code »

[ Upstream commit 912288442cb2f431bf3c8cb097a5de83bc6dbac1 ]

Currently the header size calculations are using an assignment
operator instead of a += operator when accumulating the header
size leading to incorrect sizes. Fix this by using the correct
operator.

Addresses-Coverity: ("Unused value")
Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow")
Signed-off-by: Colin Ian King
Reviewed-by: Chuck Lever
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin

Colin Ian King
2020-07-22 15:33:03 +0800
36d60eba8 ip: Fix SO_MARK in RST, ACK and ICMP packets ... Browse Code »

[ Upstream commit 0da7536fb47f51df89ccfcb1fa09f249d9accec5 ]

When no full socket is available, skbs are sent over a per-netns
control socket. Its sk_mark is temporarily adjusted to match that
of the real (request or timewait) socket or to reflect an incoming
skb, so that the outgoing skb inherits this in __ip_make_skb.

Introduction of the socket cookie mark field broke this. Now the
skb is set through the cookie and cork:

# init sockc.mark from sk_mark or cmsg
ip_append_data
ip_setup_cork # convert sockc.mark to cork mark
ip_push_pending_frames
ip_finish_skb
__ip_make_skb # set skb->mark to cork mark

But I missed these special control sockets. Update all callers of
__ip(6)_make_skb that were originally missed.

For IPv6, the same two icmp(v6) paths are affected. The third
case is not, as commit 92e55f412cff ("tcp: don't annotate
mark on control socket from tcp_v6_send_response()") replaced
the ctl_sk->sk_mark with passing the mark field directly as a
function argument. That commit predates the commit that
introduced the bug.

Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg")
Signed-off-by: Willem de Bruijn
Reported-by: Martin KaFai Lau
Reviewed-by: Martin KaFai Lau
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2020-07-22 15:32:50 +0800
94886c86e cgroup: fix cgroup_sk_alloc() for sk_clone_lock() ... Browse Code »

[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas
Reported-by: Peter Geis
Reported-by: Lu Fengqi
Reported-by: Daniël Sonck
Reported-by: Zhang Qiang
Tested-by: Cameron Berkenpas
Tested-by: Peter Geis
Tested-by: Thomas Lamprecht
Cc: Daniel Borkmann
Cc: Zefan Li
Cc: Tejun Heo
Cc: Roman Gushchin
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2020-07-22 15:32:49 +0800
171644727 tcp: md5: allow changing MD5 keys in all socket states ... Browse Code »

[ Upstream commit 1ca0fafd73c5268e8fc4b997094b8bb2bfe8deea ]

This essentially reverts commit 721230326891 ("tcp: md5: reject TCP_MD5SIG
or TCP_MD5SIG_EXT on established sockets")

Mathieu reported that many vendors BGP implementations can
actually switch TCP MD5 on established flows.

Quoting Mathieu :
Here is a list of a few network vendors along with their behavior
with respect to TCP MD5:

- Cisco: Allows for password to be changed, but within the hold-down
timer (~180 seconds).
- Juniper: When password is initially set on active connection it will
reset, but after that any subsequent password changes no network
resets.
- Nokia: No notes on if they flap the tcp connection or not.
- Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
both sides are ok with new passwords.
- Meta-Switch: Expects the password to be set before a connection is
attempted, but no further info on whether they reset the TCP
connection on a change.
- Avaya: Disable the neighbor, then set password, then re-enable.
- Zebos: Would normally allow the change when socket connected.

We can revert my prior change because commit 9424e2e7ad93 ("tcp: md5: fix potential
overestimation of TCP option space") removed the leak of 4 kernel bytes to
the wire that was the main reason for my patch.

While doing my investigations, I found a bug when a MD5 key is changed, leading
to these commits that stable teams want to consider before backporting this revert :

Commit 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Commit e6ced831ef11 ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers")

Fixes: 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
Signed-off-by: Eric Dumazet
Reported-by: Mathieu Desnoyers
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-07-22 15:32:49 +0800
8ee263bd1 tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers ... Browse Code »

[ Upstream commit e6ced831ef11a2a06e8d00aad9d4fc05b610bf38 ]

My prior fix went a bit too far, according to Herbert and Mathieu.

Since we accept that concurrent TCP MD5 lookups might see inconsistent
keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()

Clearing all key->key[] is needed to avoid possible KMSAN reports,
if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.

data_race() was added in linux-5.8 and will prevent KCSAN reports,
this can safely be removed in stable backports, if data_race() is
not yet backported.

v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()

Fixes: 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Signed-off-by: Eric Dumazet
Cc: Mathieu Desnoyers
Cc: Herbert Xu
Cc: Marco Elver
Reviewed-by: Mathieu Desnoyers
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-07-22 15:32:49 +0800
f40c3a843 tcp: md5: do not send silly options in SYNCOOKIES ... Browse Code »

[ Upstream commit e114e1e8ac9d31f25b9dd873bab5d80c1fc482ca ]

Whenever cookie_init_timestamp() has been used to encode
ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK.

Otherwise, tcp_synack_options() will still advertize options like WSCALE
that we can not deduce later when receiving the packet from the client
to complete 3WHS.

Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet,
but we can not know for sure that all TCP stacks have the same logic.

Before the fix a tcpdump would exhibit this wrong exchange :

10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0
10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0
10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0
10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12
10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0

After this patch the exchange looks saner :

11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0
11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0
11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0
11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12
11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0
11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12
11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0

Of course, no SACK block will ever be added later, but nothing should break.
Technically, we could remove the 4 nops included in MD5+TS options,
but again some stacks could break seeing not conventional alignment.

Fixes: 4957faade11b ("TCPCT part 1g: Responder Cookie => Initiator")
Signed-off-by: Eric Dumazet
Cc: Florian Westphal
Cc: Mathieu Desnoyers
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-07-22 15:32:49 +0800
1c8bad567 tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key() ... Browse Code »

[ Upstream commit 6a2febec338df7e7699a52d00b2e1207dcf65b28 ]

MD5 keys are read with RCU protection, and tcp_md5_do_add()
might update in-place a prior key.

Normally, typical RCU updates would allocate a new piece
of memory. In this case only key->key and key->keylen might
be updated, and we do not care if an incoming packet could
see the old key, the new one, or some intermediate value,
since changing the key on a live flow is known to be problematic
anyway.

We only want to make sure that in the case key->keylen
is changed, cpus in tcp_md5_hash_key() wont try to use
uninitialized data, or crash because key->keylen was
read twice to feed sg_init_one() and ahash_request_set_crypt()

Fixes: 9ea88a153001 ("tcp: md5: check md5 signature without socket lock")
Signed-off-by: Eric Dumazet
Cc: Mathieu Desnoyers
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-07-22 15:32:48 +0800
f52293aef tcp: make sure listeners don't initialize congestion-control state ... Browse Code »

[ Upstream commit ce69e563b325f620863830c246a8698ccea52048 ]

syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
just copies all the memory, the allocated pointer will be copied as
well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
If now the socket will be destroyed before the congestion-control
has properly been initialized (through a call to tcp_init_transfer), we
will end up freeing memory that does not belong to that particular
socket, opening the door to a double-free:

[ 11.413102] ==================================================================
[ 11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
[ 11.415329]
[ 11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
[ 11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 11.418148] Call Trace:
[ 11.418534]
[ 11.418834] dump_stack+0x7d/0xb0
[ 11.419297] print_address_description.constprop.0+0x1a/0x210
[ 11.422079] kasan_report_invalid_free+0x51/0x80
[ 11.423433] __kasan_slab_free+0x15e/0x170
[ 11.424761] kfree+0x8c/0x230
[ 11.425157] tcp_cleanup_congestion_control+0x58/0xd0
[ 11.425872] tcp_v4_destroy_sock+0x57/0x5a0
[ 11.426493] inet_csk_destroy_sock+0x153/0x2c0
[ 11.427093] tcp_v4_syn_recv_sock+0xb29/0x1100
[ 11.427731] tcp_get_cookie_sock+0xc3/0x4a0
[ 11.429457] cookie_v4_check+0x13d0/0x2500
[ 11.433189] tcp_v4_do_rcv+0x60e/0x780
[ 11.433727] tcp_v4_rcv+0x2869/0x2e10
[ 11.437143] ip_protocol_deliver_rcu+0x23/0x190
[ 11.437810] ip_local_deliver+0x294/0x350
[ 11.439566] __netif_receive_skb_one_core+0x15d/0x1a0
[ 11.441995] process_backlog+0x1b1/0x6b0
[ 11.443148] net_rx_action+0x37e/0xc40
[ 11.445361] __do_softirq+0x18c/0x61a
[ 11.445881] asm_call_on_stack+0x12/0x20
[ 11.446409]
[ 11.446716] do_softirq_own_stack+0x34/0x40
[ 11.447259] do_softirq.part.0+0x26/0x30
[ 11.447827] __local_bh_enable_ip+0x46/0x50
[ 11.448406] ip_finish_output2+0x60f/0x1bc0
[ 11.450109] __ip_queue_xmit+0x71c/0x1b60
[ 11.451861] __tcp_transmit_skb+0x1727/0x3bb0
[ 11.453789] tcp_rcv_state_process+0x3070/0x4d3a
[ 11.456810] tcp_v4_do_rcv+0x2ad/0x780
[ 11.457995] __release_sock+0x14b/0x2c0
[ 11.458529] release_sock+0x4a/0x170
[ 11.459005] __inet_stream_connect+0x467/0xc80
[ 11.461435] inet_stream_connect+0x4e/0xa0
[ 11.462043] __sys_connect+0x204/0x270
[ 11.465515] __x64_sys_connect+0x6a/0xb0
[ 11.466088] do_syscall_64+0x3e/0x70
[ 11.466617] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 11.467341] RIP: 0033:0x7f56046dc469
[ 11.467844] Code: Bad RIP value.
[ 11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[ 11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
[ 11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
[ 11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
[ 11.474321]
[ 11.474527] Allocated by task 4884:
[ 11.475031] save_stack+0x1b/0x40
[ 11.475548] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 11.476182] tcp_cdg_init+0xf0/0x150
[ 11.476744] tcp_init_congestion_control+0x9b/0x3a0
[ 11.477435] tcp_set_congestion_control+0x270/0x32f
[ 11.478088] do_tcp_setsockopt.isra.0+0x521/0x1a00
[ 11.478744] __sys_setsockopt+0xff/0x1e0
[ 11.479259] __x64_sys_setsockopt+0xb5/0x150
[ 11.479895] do_syscall_64+0x3e/0x70
[ 11.480395] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 11.481097]
[ 11.481321] Freed by task 4872:
[ 11.481783] save_stack+0x1b/0x40
[ 11.482230] __kasan_slab_free+0x12c/0x170
[ 11.482839] kfree+0x8c/0x230
[ 11.483240] tcp_cleanup_congestion_control+0x58/0xd0
[ 11.483948] tcp_v4_destroy_sock+0x57/0x5a0
[ 11.484502] inet_csk_destroy_sock+0x153/0x2c0
[ 11.485144] tcp_close+0x932/0xfe0
[ 11.485642] inet_release+0xc1/0x1c0
[ 11.486131] __sock_release+0xc0/0x270
[ 11.486697] sock_close+0xc/0x10
[ 11.487145] __fput+0x277/0x780
[ 11.487632] task_work_run+0xeb/0x180
[ 11.488118] __prepare_exit_to_usermode+0x15a/0x160
[ 11.488834] do_syscall_64+0x4a/0x70
[ 11.489326] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Wei Wang fixed a part of these CDG-malloc issues with commit c12014440750
("tcp: memset ca_priv data to 0 properly").

This patch here fixes the listener-scenario: We make sure that listeners
setting the congestion-control through setsockopt won't initialize it
(thus CDG never allocates on listeners). For those who use AF_UNSPEC to
reuse a socket, tcp_disconnect() is changed to cleanup afterwards.

(The issue can be reproduced at least down to v4.4.x.)

Cc: Wei Wang
Cc: Eric Dumazet
Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Signed-off-by: Christoph Paasch
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Christoph Paasch
2020-07-22 15:32:48 +0800
7eec9f331 tcp: fix SO_RCVLOWAT possible hangs under high mem pressure ... Browse Code »

[ Upstream commit ba3bb0e76ccd464bb66665a1941fabe55dadb3ba ]

Whenever tcp_try_rmem_schedule() returns an error, we are under
trouble and should make sure to wakeup readers so that they
can drain socket queues and eventually make room.

Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-07-22 15:32:48 +0800
9b7fd81cf sched: consistently handle layer3 header accesses in the presence of VLANs ... Browse Code »

[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice

Reported-by: Ilya Ponetayev
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Toke Høiland-Jørgensen
2020-07-22 15:32:48 +0800
edbde451b net_sched: fix a memory leak in atm_tc_init() ... Browse Code »

[ Upstream commit 306381aec7c2b5a658eebca008c8a1b666536cba ]

When tcf_block_get() fails inside atm_tc_init(),
atm_tc_put() is called to release the qdisc p->link.q.
But the flow->ref prevents it to do so, as the flow->ref
is still zero.

Fix this by moving the p->link.ref initialization before
tcf_block_get().

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2020-07-22 15:32:48 +0800
a70a66773 llc: make sure applications use ARPHRD_ETHER ... Browse Code »

[ Upstream commit a9b1110162357689a34992d5c925852948e5b9fd ]

syzbot was to trigger a bug by tricking AF_LLC with
non sensible addr->sllc_arphrd

It seems clear LLC requires an Ethernet device.

Back in commit abf9d537fea2 ("llc: add support for SO_BINDTODEVICE")
Octavian Purdila added possibility for application to use a zero
value for sllc_arphrd, convert it to ARPHRD_ETHER to not cause
regressions on existing applications.

BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:199 [inline]
BUG: KASAN: use-after-free in list_empty include/linux/list.h:268 [inline]
BUG: KASAN: use-after-free in waitqueue_active include/linux/wait.h:126 [inline]
BUG: KASAN: use-after-free in wq_has_sleeper include/linux/wait.h:160 [inline]
BUG: KASAN: use-after-free in skwq_has_sleeper include/net/sock.h:2092 [inline]
BUG: KASAN: use-after-free in sock_def_write_space+0x642/0x670 net/core/sock.c:2813
Read of size 8 at addr ffff88801e0b4078 by task ksoftirqd/3/27

CPU: 3 PID: 27 Comm: ksoftirqd/3 Not tainted 5.5.0-rc1-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:639
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
__read_once_size include/linux/compiler.h:199 [inline]
list_empty include/linux/list.h:268 [inline]
waitqueue_active include/linux/wait.h:126 [inline]
wq_has_sleeper include/linux/wait.h:160 [inline]
skwq_has_sleeper include/net/sock.h:2092 [inline]
sock_def_write_space+0x642/0x670 net/core/sock.c:2813
sock_wfree+0x1e1/0x260 net/core/sock.c:1958
skb_release_head_state+0xeb/0x260 net/core/skbuff.c:652
skb_release_all+0x16/0x60 net/core/skbuff.c:663
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:838 [inline]
consume_skb+0xfb/0x410 net/core/skbuff.c:832
__dev_kfree_skb_any+0xa4/0xd0 net/core/dev.c:2967
dev_kfree_skb_any include/linux/netdevice.h:3650 [inline]
e1000_unmap_and_free_tx_resource.isra.0+0x21b/0x3a0 drivers/net/ethernet/intel/e1000/e1000_main.c:1963
e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3854 [inline]
e1000_clean+0x4cc/0x1d10 drivers/net/ethernet/intel/e1000/e1000_main.c:3796
napi_poll net/core/dev.c:6532 [inline]
net_rx_action+0x508/0x1120 net/core/dev.c:6600
__do_softirq+0x262/0x98c kernel/softirq.c:292
run_ksoftirqd kernel/softirq.c:603 [inline]
run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 8247:
save_stack+0x23/0x90 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
__kasan_kmalloc mm/kasan/common.c:513 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
slab_post_alloc_hook mm/slab.h:584 [inline]
slab_alloc mm/slab.c:3320 [inline]
kmem_cache_alloc+0x121/0x710 mm/slab.c:3484
sock_alloc_inode+0x1c/0x1d0 net/socket.c:240
alloc_inode+0x68/0x1e0 fs/inode.c:230
new_inode_pseudo+0x19/0xf0 fs/inode.c:919
sock_alloc+0x41/0x270 net/socket.c:560
__sock_create+0xc2/0x730 net/socket.c:1384
sock_create net/socket.c:1471 [inline]
__sys_socket+0x103/0x220 net/socket.c:1513
__do_sys_socket net/socket.c:1522 [inline]
__se_sys_socket net/socket.c:1520 [inline]
__ia32_sys_socket+0x73/0xb0 net/socket.c:1520
do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline]
do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

Freed by task 17:
save_stack+0x23/0x90 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
kasan_set_free_info mm/kasan/common.c:335 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
__cache_free mm/slab.c:3426 [inline]
kmem_cache_free+0x86/0x320 mm/slab.c:3694
sock_free_inode+0x20/0x30 net/socket.c:261
i_callback+0x44/0x80 fs/inode.c:219
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch kernel/rcu/tree.c:2183 [inline]
rcu_core+0x570/0x1540 kernel/rcu/tree.c:2408
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2417
__do_softirq+0x262/0x98c kernel/softirq.c:292

The buggy address belongs to the object at ffff88801e0b4000
which belongs to the cache sock_inode_cache of size 1152
The buggy address is located 120 bytes inside of
1152-byte region [ffff88801e0b4000, ffff88801e0b4480)
The buggy address belongs to the page:
page:ffffea0000782d00 refcount:1 mapcount:0 mapping:ffff88807aa59c40 index:0xffff88801e0b4ffd
raw: 00fffe0000000200 ffffea00008e6c88 ffffea0000782d48 ffff88807aa59c40
raw: ffff88801e0b4ffd ffff88801e0b4000 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88801e0b3f00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
ffff88801e0b3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88801e0b4000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88801e0b4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801e0b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: abf9d537fea2 ("llc: add support for SO_BINDTODEVICE")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-07-22 15:32:47 +0800
73e42f4d2 l2tp: remove skb_dst_set() from l2tp_xmit_skb() ... Browse Code »

[ Upstream commit 27d53323664c549b5bb2dfaaf6f7ad6e0376a64e ]

In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
skb's dst. However, it will eventually call inet6_csk_xmit() or
ip_queue_xmit() where skb's dst will be overwritten by:

skb_dst_set_noref(skb, dst);

without releasing the old dst in skb. Then it causes dst/dev refcnt leak:

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

This can be reproduced by simply running:

# modprobe l2tp_eth && modprobe l2tp_ip
# sh ./tools/testing/selftests/net/l2tp.sh

So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
should be dropped. This patch is to fix it by removing skb_dst_set()
from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().

Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Hangbin Liu
Signed-off-by: Xin Long
Acked-by: James Chapman
Tested-by: James Chapman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2020-07-22 15:32:47 +0800
f8646548e ipv6: Fix use of anycast address with loopback ... Browse Code »

[ Upstream commit aea23c323d89836bcdcee67e49def997ffca043b ]

Thomas reported a regression with IPv6 and anycast using the following
reproducer:

echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
ip -6 a add fc12::1/16 dev lo
sleep 2
echo "pinging lo"
ping6 -c 2 fc12::

The conversion of addrconf_f6i_alloc to use ip6_route_info_create missed
the use of fib6_is_reject which checks addresses added to the loopback
interface and sets the REJECT flag as needed. Update fib6_is_reject for
loopback checks to handle RTF_ANYCAST addresses.

Fixes: c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
Reported-by: thomas.gambier@nexedi.com
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Ahern
2020-07-22 15:32:47 +0800
75270f819 ipv6: fib6_select_path can not use out path for nexthop objects ... Browse Code »

[ Upstream commit 34fe5a1cf95c3f114068fc16d919c9cf4b00e428 ]

Brian reported a crash in IPv6 code when using rpfilter with a setup
running FRR and external nexthop objects. The root cause of the crash
is fib6_select_path setting fib6_nh in the result to NULL because of
an improper check for nexthop objects.

More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif
set causing fib6_select_path to be called with have_oif_match set.
fib6_select_path has early check on have_oif_match and jumps to the
out label which presumes a builtin fib6_nh. This path is invalid for
nexthop objects; for external nexthops fib6_select_path needs to just
return if the fib6_nh has already been set in the result otherwise it
returns after the call to nexthop_path_fib6_result. Update the check
on have_oif_match to not bail on external nexthops.

Update selftests for this problem.

Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info")
Reported-by: Brian Rak
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Ahern
2020-07-22 15:32:47 +0800