Eric Lee / smarc-fsl-linux-kernel

23 Dec, 2019

1 commit

78bac77b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from David Miller:

1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
including adding a missing ipv6 match description.

2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
Bhat.

3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
Chaignon.

7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

8) Fix established socket lookup race when socket goes from
TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
RCU grace period. From Eric Dumazet.

9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

10) Fix active backup transition after link failure in bonding, from
Mahesh Bandewar.

11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

16) Reject invalid MTU values in stmmac, from Jose Abreu.

17) Fix refcount leak in error path of u32 classifier, from Davide
Caratti.

18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
Kaseorg.

19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

20) Disable hardware GRO when XDP is attached to qede, frm Manish
Chopra.

21) Since we encode state in the low pointer bits, dst metrics must be
at least 4 byte aligned, which is not necessarily true on m68k. Add
annotations to fix this, from Geert Uytterhoeven.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
sfc: Include XDP packet headroom in buffer step size.
sfc: fix channel allocation with brute force
net: dst: Force 4-byte alignment of dst_metrics
selftests: pmtu: fix init mtu value in description
hv_netvsc: Fix unwanted rx_table reset
net: phy: ensure that phy IDs are correctly typed
mod_devicetable: fix PHY module format
qede: Disable hardware gro when xdp prog is installed
net: ena: fix issues in setting interrupt moderation params in ethtool
net: ena: fix default tx interrupt moderation interval
net/smc: unregister ib devices in reboot_event
net: stmmac: platform: Fix MDIO init for platforms without PHY
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
net: dsa: ksz: use common define for tag len
s390/qeth: don't return -ENOTSUPP to userspace
s390/qeth: fix promiscuous mode after reset
s390/qeth: handle error due to unsupported transport mode
cxgb4: fix refcount init for TC-MQPRIO offload
tc-testing: initial tdc selftests for cls_u32
...

Linus Torvalds
2019-12-23 01:54:33 +0800

21 Dec, 2019

3 commits

28a3b8408 net/smc: unregister ib devices in reboot_event ... Browse Code »

In the reboot_event handler, unregister the ib devices and enable
the IB layer to release the devices before the reboot.

Fixes: a33a803cfe64 ("net/smc: guarantee removal of link groups in reboot")
Signed-off-by: Karsten Graul
Reviewed-by: Ursula Braun
Signed-off-by: David S. Miller

Karsten Graul
2019-12-21 13:31:19 +0800
af1c0e4e0 llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) ... Browse Code »

When a frame with NULL DSAP is received, llc_station_rcv is called.
In turn, llc_stat_ev_rx_null_dsap_xid_c is called to check if it is a NULL
XID frame. The return statement of llc_stat_ev_rx_null_dsap_xid_c returns 1
when the incoming frame is not a NULL XID frame and 0 otherwise. Hence, a
NULL XID response is returned unexpectedly, e.g. when the incoming frame is
a NULL TEST command.

To fix the error, simply remove the conditional operator.

A similar error in llc_stat_ev_rx_null_dsap_test_c is also fixed.

Signed-off-by: Chan Shu Tak, Alex
Signed-off-by: David S. Miller

Chan Shu Tak, Alex
2019-12-21 13:19:36 +0800
4249c507f net: dsa: ksz: use common define for tag len ... Browse Code »

Remove special taglen define KSZ8795_INGRESS_TAG_LEN
and use generic KSZ_INGRESS_TAG_LEN instead.

Signed-off-by: Michael Grzeschik
Reviewed-by: Andrew Lunn
Signed-off-by: David S. Miller

Michael Grzeschik
2019-12-21 13:06:49 +0800

20 Dec, 2019

3 commits

275c44aa1 net/sched: cls_u32: fix refcount leak in the error path of u32_change() ... Browse Code »

when users replace cls_u32 filters with new ones having wrong parameters,
so that u32_change() fails to validate them, the kernel doesn't roll-back
correctly, and leaves semi-configured rules.

Fix this in u32_walk(), avoiding a call to the walker function on filters
that don't have a match rule connected. The side effect is, these "empty"
filters are not even dumped when present; but that shouldn't be a problem
as long as we are restoring the original behaviour, where semi-configured
filters were not even added in the error path of u32_change().

Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Davide Caratti
Signed-off-by: David S. Miller

Davide Caratti
2019-12-20 09:53:05 +0800
0fd260056 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf ... Browse Code »

Daniel Borkmann says:

====================
pull-request: bpf 2019-12-19

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 269 insertions(+), 108 deletions(-).

The main changes are:

1) Fix lack of synchronization between xsk wakeup and destroying resources
used by xsk wakeup, from Maxim Mikityanskiy.

2) Fix pruning with tail call patching, untrack programs in case of verifier
error and fix a cgroup local storage tracking bug, from Daniel Borkmann.

3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to
egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer.

4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when
only cBPF is present, from Alexander Lobakin.
====================

Signed-off-by: David S. Miller

David S. Miller
2019-12-20 06:20:47 +0800
1148f9adb net, sysctl: Fix compiler warning when only cBPF is present ... Browse Code »

proc_dointvec_minmax_bpf_restricted() has been firstly introduced
in commit 2e4a30983b0f ("bpf: restrict access to core bpf sysctls")
under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in
ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv
allocations"), because a new sysctl, bpf_jit_limit, made use of it.
Finally, this parameter has become long instead of integer with
fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
added.

With this last change, we got back to that
proc_dointvec_minmax_bpf_restricted() is used only under
CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
brought back.

So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
since v4.20 we have:

CC net/core/sysctl_net_core.o
net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.

Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Signed-off-by: Alexander Lobakin
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru

Alexander Lobakin
2019-12-20 00:17:51 +0800

19 Dec, 2019

2 commits

068706820 xsk: Add rcu_read_lock around the XSK wakeup ... Browse Code »

The XSK wakeup callback in drivers makes some sanity checks before
triggering NAPI. However, some configuration changes may occur during
this function that affect the result of those checks. For example, the
interface can go down, and all the resources will be destroyed after the
checks in the wakeup function, but before it attempts to use these
resources. Wrap this callback in rcu_read_lock to allow driver to
synchronize_rcu before actually destroying the resources.

xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
wrapped into the RCU lock. After this commit, xsk_poll starts using
xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
need_wakeup feature: a non-zero-copy socket may be used with a driver
supporting zero-copy, and in this case ndo_xsk_wakeup should not be
called, so the xs->zc check is the correct one.

Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Maxim Mikityanskiy
Signed-off-by: Björn Töpel
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com

Maxim Mikityanskiy
2019-12-19 23:20:48 +0800
b7ac89365 net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive() ... Browse Code »

The kernel may sleep while holding a spinlock.
The function call path (from bottom to top) in Linux 4.19 is:

net/nfc/nci/uart.c, 349:
nci_skb_alloc in nci_uart_default_recv_buf
net/nfc/nci/uart.c, 255:
(FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive
net/nfc/nci/uart.c, 254:
spin_lock in nci_uart_tty_receive

nci_skb_alloc(GFP_KERNEL) can sleep at runtime.
(FUNC_PTR) means a function pointer is called.

To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for
nci_skb_alloc().

This bug is found by a static analysis tool STCheck written by myself.

Signed-off-by: Jia-Ju Bai
Signed-off-by: David S. Miller

Jia-Ju Bai
2019-12-19 03:57:33 +0800

18 Dec, 2019

4 commits

ddd9b5e3e net-sysfs: Call dev_hold always in rx_queue_add_kobject ... Browse Code »

Dev_hold has to be called always in rx_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.

Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: syzbot
Cc: Tetsuo Handa
Cc: David Miller
Cc: Lukas Bulwahn
Signed-off-by: Jouni Hogander
Signed-off-by: David S. Miller

Jouni Hogander
2019-12-18 14:57:11 +0800
4e2ce6e55 net: dsa: make unexported dsa_link_touch() static ... Browse Code »

dsa_link_touch() is not exported, or defined outside of the
file it is in so make it static to avoid the following warning:

net/dsa/dsa2.c:127:17: warning: symbol 'dsa_link_touch' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink)
Signed-off-by: David S. Miller

Ben Dooks (Codethink)
2019-12-18 14:40:39 +0800
7c68fa2bd net: annotate lockless accesses to sk->sk_pacing_shift ... Browse Code »

sk->sk_pacing_shift can be read and written without lock
synchronization. This patch adds annotations to
document this fact and avoid future syzbot complains.

This might also avoid unexpected false sharing
in sk_pacing_shift_update(), as the compiler
could remove the conditional check and always
write over sk->sk_pacing_shift :

if (sk->sk_pacing_shift != val)
sk->sk_pacing_shift = val;

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2019-12-18 14:09:52 +0800
951c6db95 sctp: fix memleak on err handling of stream initialization ... Browse Code »

syzbot reported a memory leak when an allocation fails within
genradix_prealloc() for output streams. That's because
genradix_prealloc() leaves initialized members initialized when the
issue happens and SCTP stack will abort the current initialization but
without cleaning up such members.

The fix here is to always call genradix_free() when genradix_prealloc()
fails, for output and also input streams, as it suffers from the same
issue.

Reported-by: syzbot+772d9e36c490b18d51d1@syzkaller.appspotmail.com
Fixes: 2075e50caf5e ("sctp: convert to genradix")
Signed-off-by: Marcelo Ricardo Leitner
Tested-by: Xin Long
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2019-12-18 13:58:37 +0800

17 Dec, 2019

3 commits

ad125c6c0 Merge tag 'mac80211-for-net-2019-10-16' of git://git.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/jberg/mac80211

Johannes Berg says:

====================
A handful of fixes:
* disable AQL on most drivers, addressing the iwlwifi issues
* fix double-free on network namespace changes
* fix TID field in frames injected through monitor interfaces
* fix ieee80211_calc_rx_airtime()
* fix NULL pointer dereference in rfkill (and remove BUG_ON)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2019-12-17 11:26:11 +0800
4aaf59614 vsock/virtio: add WARN_ON check on virtio_transport_get_ops() ... Browse Code »

virtio_transport_get_ops() and virtio_transport_send_pkt_info()
can only be used on connecting/connected sockets, since a socket
assigned to a transport is required.

This patch adds a WARN_ON() on virtio_transport_get_ops() to check
this requirement, a comment and a returned error on
virtio_transport_send_pkt_info(),

Signed-off-by: Stefano Garzarella
Signed-off-by: David S. Miller

Stefano Garzarella
2019-12-17 08:07:12 +0800
df18fa146 vsock/virtio: fix null-pointer dereference in virtio_transport_recv_listen() ... Browse Code »

With multi-transport support, listener sockets are not bound to any
transport. So, calling virtio_transport_reset(), when an error
occurs, on a listener socket produces the following null-pointer
dereference:

BUG: kernel NULL pointer dereference, address: 00000000000000e8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty #56
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common]
Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02
RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000
RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000
RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08
R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000
R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724
FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common]
virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common]
? detach_buf_split+0xf1/0x130
virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport]
process_one_work+0x1c0/0x300
worker_thread+0x45/0x3c0
kthread+0xfc/0x130
? current_work+0x40/0x40
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core
CR2: 00000000000000e8
---[ end trace e75400e2ea2fa824 ]---

This happens because virtio_transport_reset() calls
virtio_transport_send_pkt_info() that can be used only on
connecting/connected sockets.

This patch fixes the issue, using virtio_transport_reset_no_sock()
instead of virtio_transport_reset() when we are handling a listener
socket.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Signed-off-by: Stefano Garzarella
Signed-off-by: David S. Miller

Stefano Garzarella
2019-12-17 08:07:12 +0800

16 Dec, 2019

2 commits

6fc232db9 rfkill: Fix incorrect check to avoid NULL pointer dereference ... Browse Code »

In rfkill_register, the struct rfkill pointer is first derefernced
and then checked for NULL. This patch removes the BUG_ON and returns
an error to the caller in case rfkill is NULL.

Signed-off-by: Aditya Pakki
Link: https://lore.kernel.org/r/20191215153409.21696-1-pakki001@umn.edu
Signed-off-by: Johannes Berg

Aditya Pakki
2019-12-16 17:15:49 +0800
86434744f net/smc: add fallback check to connect() ... Browse Code »

FASTOPEN setsockopt() or sendmsg() may switch the SMC socket to fallback
mode. Once fallback mode is active, the native TCP socket functions are
called. Nevertheless there is a small race window, when FASTOPEN
setsockopt/sendmsg runs in parallel to a connect(), and switch the
socket into fallback mode before connect() takes the sock lock.
Make sure the SMC-specific connect setup is omitted in this case.

This way a syzbot-reported refcount problem is fixed, triggered by
different threads running non-blocking connect() and FASTOPEN_KEY
setsockopt.

Reported-by: syzbot+96d3f9ff6a86d37e44c8@syzkaller.appspotmail.com
Fixes: 6d6dd528d5af ("net/smc: fix refcount non-blocking connect() -part 2")
Signed-off-by: Ursula Braun
Signed-off-by: Karsten Graul
Signed-off-by: Jakub Kicinski

Ursula Braun
2019-12-16 03:10:30 +0800

14 Dec, 2019

7 commits

216808c6b tcp: refine rule to allow EPOLLOUT generation under mem pressure ... Browse Code »

At the time commit ce5ec440994b ("tcp: ensure epoll edge trigger
wakeup when write queue is empty") was added to the kernel,
we still had a single write queue, combining rtx and write queues.

Once we moved the rtx queue into a separate rb-tree, testing
if sk_write_queue is empty has been suboptimal.

Indeed, if we have packets in the rtx queue, we probably want
to delay the EPOLLOUT generation at the time incoming packets
will free them, making room, but more importantly avoiding
flooding application with EPOLLOUT events.

Solution is to use tcp_rtx_and_write_queues_empty() helper.

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet
Cc: Jason Baron
Cc: Neal Cardwell
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: Jakub Kicinski

Eric Dumazet
2019-12-14 13:58:40 +0800
ee2aabd3f tcp: refine tcp_write_queue_empty() implementation ... Browse Code »

Due to how tcp_sendmsg() is implemented, we can have an empty
skb at the tail of the write queue.

Most [1] tcp_write_queue_empty() callers want to know if there is
anything to send (payload and/or FIN)

Instead of checking if the sk_write_queue is empty, we need
to test if tp->write_seq == tp->snd_nxt

[1] tcp_send_fin() was the only caller that expected to
see if an skb was in the write queue, I have changed the code
to reuse the tcp_write_queue_tail() result.

Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: Jakub Kicinski

Eric Dumazet
2019-12-14 13:58:40 +0800
1f85e6267 tcp: do not send empty skb from tcp_write_xmit() ... Browse Code »

Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
write queue in error cases") in linux-4.14 stable triggered
various bugs. One of them has been fixed in commit ba2ddb43f270
("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
we still have crashes in some occasions.

Root-cause is that when tcp_sendmsg() has allocated a fresh
skb and could not append a fragment before being blocked
in sk_stream_wait_memory(), tcp_write_xmit() might be called
and decide to send this fresh and empty skb.

Sending an empty packet is not only silly, it might have caused
many issues we had in the past with tp->packets_out being
out of sync.

Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
Signed-off-by: Eric Dumazet
Cc: Christoph Paasch
Acked-by: Neal Cardwell
Cc: Jason Baron
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: Jakub Kicinski

Eric Dumazet
2019-12-14 13:58:40 +0800
8dbd76e79 tcp/dccp: fix possible race __inet_lookup_established() ... Browse Code »

Michal Kubecek and Firo Yang did a very nice analysis of crashes
happening in __inet_lookup_established().

Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
(via a close()/socket()/listen() cycle) without a RCU grace period,
I should not have changed listeners linkage in their hash table.

They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
so that a lookup can detect a socket in a hash list was moved in
another one.

Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve
merge conflict for v4/v6 ordering fix"), we have to add
hlist_nulls_add_tail_rcu() helper.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet
Reported-by: Michal Kubecek
Reported-by: Firo Yang
Reviewed-by: Michal Kubecek
Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
Signed-off-by: Jakub Kicinski

Eric Dumazet
2019-12-14 13:40:49 +0800
2beb6d290 ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is set ... Browse Code »

In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for
doit handlers") we add strict check for inet6_rtm_getaddr(). But we did
the invalid header values check before checking if NETLINK_F_STRICT_CHK
is set. This may break backwards compatibility if user already set the
ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code.

I didn't move the nlmsg_len check because I thought it's a valid check.

Reported-by: Jianlin Shi
Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers")
Signed-off-by: Hangbin Liu
Reviewed-by: David Ahern
Signed-off-by: Jakub Kicinski

Hangbin Liu
2019-12-14 09:13:49 +0800
5133498f4 bpf: Clear skb->tstamp in bpf_redirect when necessary ... Browse Code »

Redirecting a packet from ingress to egress by using bpf_redirect
breaks if the egress interface has an fq qdisc installed. This is the same
problem as fixed in 'commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths")

Clear skb->tstamp when redirecting into the egress path.

Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Reviewed-by: Eric Dumazet
Link: https://lore.kernel.org/bpf/20191213180817.2510-1-lmb@cloudflare.com

Lorenz Bauer
2019-12-14 07:21:48 +0800
5bd831a46 Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring fixes from Jens Axboe:

- A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
don't sever if the result is < 0.

This is mostly for linked timeouts, where if we ask for a pure
timeout we always get -ETIME. This makes links useless for that case,
hence allow a case where it works.

- Five minor optimizations to fix and improve cases that regressed
since v5.4.

- An SQTHREAD locking fix.

- A sendmsg/recvmsg iov assignment fix.

- Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
subsequently ensuring that works for io_uring.

- Fix a case where for an invalid opcode we might return -EBADF instead
of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.

* tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
io_uring: ensure we return -EINVAL on unknown opcode
io_uring: add sockets to list of files that support non-blocking issue
net: make socket read/write_iter() honor IOCB_NOWAIT
io_uring: only hash regular files for async work execution
io_uring: run next sqe inline if possible
io_uring: don't dynamically allocate poll data
io_uring: deferred send/recvmsg should assign iov
io_uring: sqthread should grab ctx->uring_lock for submissions
io-wq: briefly spin for new work after finishing work
io-wq: remove worker->wait waitqueue
io_uring: allow unbreakable links

Linus Torvalds
2019-12-14 06:24:54 +0800

13 Dec, 2019

4 commits

911bde0fe mac80211: Turn AQL into an NL80211_EXT_FEATURE ... Browse Code »

Instead of just having an airtime flag in debugfs, turn AQL into a proper
NL80211_EXT_FEATURE, so drivers can turn it on when they are ready, and so
we also expose the presence of the feature to userspace.

This also has the effect of flipping the default, so drivers have to opt in
to using AQL instead of getting it by default with TXQs. To keep
functionality the same as pre-patch, we set this feature for ath10k (which
is where it is needed the most).

While we're at it, split out the debugfs interface so AQL gets its own
per-station debugfs file instead of using the 'airtime' file.

[Johannes:]
This effectively disables AQL for iwlwifi, where it fixes a number of
issues:
* TSO in iwlwifi is causing underflows and associated warnings in AQL
* HE (802.11ax) rates aren't reported properly so at HE rates, AQL could
never have a valid estimate (it'd use 6 Mbps instead of up to 2400!)

Signed-off-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/r/20191212111437.224294-1-toke@redhat.com
Fixes: 3ace10f5b5ad ("mac80211: Implement Airtime-based Queue Limit (AQL)")
Signed-off-by: Johannes Berg

Toke Høiland-Jørgensen
2019-12-13 17:34:04 +0800
e548f749b mac80211: airtime: Fix an off by one in ieee80211_calc_rx_airtime() ... Browse Code »

This code was copied from mt76 and inherited an off by one bug from
there. The > should be >= so that we don't read one element beyond
the end of the array.

Fixes: db3e1c40cf2f ("mac80211: Import airtime calculation code from mt76")
Reported-by: Toke Høiland-Jørgensen
Signed-off-by: Dan Carpenter
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/r/20191126120910.ftr4t7me3by32aiz@kili.mountain
Signed-off-by: Johannes Berg

Dan Carpenter
2019-12-13 17:08:22 +0800
56cb31e18 cfg80211: fix double-free after changing network namespace ... Browse Code »

If wdev->wext.keys was initialized it didn't get reset to NULL on
unregister (and it doesn't get set in cfg80211_init_wdev either), but
wdev is reused if unregister was triggered through
cfg80211_switch_netns.

The next unregister (for whatever reason) will try to free
wdev->wext.keys again.

Signed-off-by: Stefan Bühler
Link: https://lore.kernel.org/r/20191126100543.782023-1-stefan.buehler@tik.uni-stuttgart.de
Signed-off-by: Johannes Berg

Stefan Bühler
2019-12-13 17:08:09 +0800
753ffad3d mac80211: fix TID field in monitor mode transmit ... Browse Code »

Fix overwriting of the qos_ctrl.tid field for encrypted frames injected on
a monitor interface. While qos_ctrl.tid is not encrypted, it's used as an
input into the encryption algorithm so it's protected, and thus cannot be
modified after encryption. For injected frames, the encryption may already
have been done in userspace, so we cannot change any fields.

Before passing the frame to the driver, the qos_ctrl.tid field is updated
from skb->priority. Prior to dbd50a851c50 skb->priority was updated in
ieee80211_select_queue_80211(), but this function is no longer always
called.

Update skb->priority in ieee80211_monitor_start_xmit() so that the value
is stored, and when later code 'modifies' the TID it really sets it to
the same value as before, preserving the encryption.

Fixes: dbd50a851c50 ("mac80211: only allocate one queue when using iTXQs")
Signed-off-by: Fredrik Olofsson
Link: https://lore.kernel.org/r/20191119133451.14711-1-fredrik.olofsson@anyfinetworks.com
[rewrite commit message based on our discussion]
Signed-off-by: Johannes Berg

Fredrik Olofsson
2019-12-13 17:06:39 +0800

11 Dec, 2019

5 commits

31e4ccc99 tipc: fix use-after-free in tipc_disc_rcv() ... Browse Code »

In the function 'tipc_disc_rcv()', the 'msg_peer_net_hash()' is called
to read the header data field but after the message skb has been freed,
that might result in a garbage value...

This commit fixes it by defining a new local variable to store the data
first, just like the other header fields' handling.

Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Acked-by: Jon Maloy
Signed-off-by: Tuong Lien
Signed-off-by: David S. Miller

Tuong Lien
2019-12-11 09:45:04 +0800
abc9b4e05 tipc: fix retrans failure due to wrong destination ... Browse Code »

When a user message is sent, TIPC will check if the socket has faced a
congestion at link layer. If that happens, it will make a sleep to wait
for the congestion to disappear. This leaves a gap for other users to
take over the socket (e.g. multi threads) since the socket is released
as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free
to send messages to various destinations (e.g. via 'sendto()'), then
the socket's preformatted header has to be updated correspondingly
prior to the actual payload message building.

Unfortunately, the latter action is done before the first action which
causes a condition issue that the destination of a certain message can
be modified incorrectly in the middle, leading to wrong destination
when that message is built. Consequently, when the message is sent to
the link layer, it gets stuck there forever because the peer node will
simply reject it. After a number of retransmission attempts, the link
is eventually taken down and the retransmission failure is reported.

This commit fixes the problem by rearranging the order of actions to
prevent the race condition from occurring, so the message building is
'atomic' and its header will not be modified by anyone.

Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
Acked-by: Jon Maloy
Signed-off-by: Tuong Lien
Signed-off-by: David S. Miller

Tuong Lien
2019-12-11 09:45:04 +0800
dca4a17d2 tipc: fix potential hanging after b/rcast changing ... Browse Code »

In commit c55c8edafa91 ("tipc: smooth change between replicast and
broadcast"), we allow instant switching between replicast and broadcast
by sending a dummy 'SYN' packet on the last used link to synchronize
packets on the links. The 'SYN' message is an object of link congestion
also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent
back to the socket...
However, in that commit, we simply use the same socket 'cong_link_cnt'
counter for both the 'SYN' & normal payload message sending. Therefore,
if both the replicast & broadcast links are congested, the counter will
be not updated correctly but overwritten by the latter congestion.
Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is
reduced one by one and eventually overflowed. Consequently, further
activities on the socket will only wait for the false congestion signal
to disappear but never been met.

Because sending the 'SYN' message is vital for the mechanism, it should
be done anyway. This commit fixes the issue by marking the message with
an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a
link congestion, there is no need to touch the socket 'cong_link_cnt'
either. In addition, in the event of any error (e.g. -ENOBUFS), we will
purge the entire payload message queue and make a return immediately.

Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy
Signed-off-by: Tuong Lien
Signed-off-by: David S. Miller

Tuong Lien
2019-12-11 09:45:04 +0800
d5162f341 tipc: fix name table rbtree issues ... Browse Code »

The current rbtree for service ranges in the name table is built based
on the 'lower' & 'upper' range values resulting in a flaw in the rbtree
searching. Some issues have been observed in case of range overlapping:

Case #1: unable to withdraw a name entry:
After some name services are bound, all of them are withdrawn by user
but one remains in the name table forever. This corrupts the table and
that service becomes dummy i.e. no real port.
E.g.

/
{22, 22}
/
/
---> {10, 50}
/ \
/ \
{10, 30} {20, 60}

The node {10, 30} cannot be removed since the rbtree searching stops at
the node's ancestor i.e. {10, 50}, so starting from it will never reach
the finding node.

Case #2: failed to send data in some cases:
E.g. Two service ranges: {20, 60}, {10, 50} are bound. The rbtree for
this service will be one of the two cases below depending on the order
of the bindings:

{20, 60} {10, 50}
Signed-off-by: Tuong Lien
Signed-off-by: David S. Miller

Tuong Lien
2019-12-11 09:45:04 +0800
ebfcd8955 net: make socket read/write_iter() honor IOCB_NOWAIT ... Browse Code »

The socket read/write helpers only look at the file O_NONBLOCK. not
the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
and io_uring that rely on not having the file itself marked nonblocking,
but rather the iocb itself.

Cc: netdev@vger.kernel.org
Acked-by: David Miller
Signed-off-by: Jens Axboe

Jens Axboe
2019-12-11 07:33:23 +0800

10 Dec, 2019

6 commits

b43d1f9f7 af_packet: set defaule value for tmo ... Browse Code »

There is softlockup when using TPACKET_V3:
...
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
(__irq_svc) from [] (_raw_spin_unlock_irqrestore+0x44/0x54)
(_raw_spin_unlock_irqrestore) from [] (mod_timer+0x210/0x25c)
(mod_timer) from []
(prb_retire_rx_blk_timer_expired+0x68/0x11c)
(prb_retire_rx_blk_timer_expired) from []
(call_timer_fn+0x90/0x17c)
(call_timer_fn) from [] (run_timer_softirq+0x2d4/0x2fc)
(run_timer_softirq) from [] (__do_softirq+0x218/0x318)
(__do_softirq) from [] (irq_exit+0x88/0xac)
(irq_exit) from [] (msa_irq_exit+0x11c/0x1d4)
(msa_irq_exit) from [] (handle_IPI+0x650/0x7f4)
(handle_IPI) from [] (gic_handle_irq+0x108/0x118)
(gic_handle_irq) from [] (__irq_usr+0x44/0x5c)
...

If __ethtool_get_link_ksettings() is failed in
prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
is zero and the timer expire for retire_blk_timer is turn to
mod_timer(&pkc->retire_blk_timer, jiffies + 0),
which will trigger cpu usage of softirq is 100%.

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Tested-by: Xiao Jiangfeng
Signed-off-by: Mao Wenan
Signed-off-by: David S. Miller

Mao Wenan
2019-12-10 06:30:19 +0800
7da538c1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Wait for rcu grace period after releasing netns in ctnetlink,
from Florian Westphal.

2) Incorrect command type in flowtable offload ndo invocation,
from wenxu.

3) Incorrect callback type in flowtable offload flow tuple
updates, also from wenxu.

4) Fix compile warning on flowtable offload infrastructure due to
possible reference to uninitialized variable, from Nathan Chancellor.

5) Do not inline nf_ct_resolve_clash(), this is called from slow
path / stress situations. From Florian Westphal.

6) Missing IPv6 flow selector description in flowtable offload.

7) Missing check for NETDEV_UNREGISTER in nf_tables offload
infrastructure, from wenxu.

8) Update NAT selftest to use randomized netns names, from
Florian Westphal.

9) Restore nfqueue bridge support, from Marco Oliverio.

10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header.
From Phil Sutter.

11) Fix bogus lookup/get match for non-anonymous rbtree sets.

12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END
elements.

13) Missing netlink validation for NFT_DATA_VALUE after
nft_data_init().

14) If rule specifies no actions, offload infrastructure returns
EOPNOTSUPP.

15) Module refcount leak in object updates.

16) Missing sanitization for ARP traffic from br_netfilter, from
Eric Dumazet.

17) Compilation breakage on big-endian due to incorrect memcpy()
size in the flowtable offload infrastructure.
====================

Signed-off-by: David S. Miller

David S. Miller
2019-12-10 06:03:33 +0800
7acd9378d netfilter: nf_flow_table_offload: Correct memcpy size for flow_overload_mangle() ... Browse Code »

In function 'memcpy',
inlined from 'flow_offload_mangle' at net/netfilter/nf_flow_table_offload.c:112:2,
inlined from 'flow_offload_port_dnat' at net/netfilter/nf_flow_table_offload.c:373:2,
inlined from 'nf_flow_rule_route_ipv4' at net/netfilter/nf_flow_table_offload.c:424:3:
./include/linux/string.h:376:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
376 | __read_overflow2();
| ^~~~~~~~~~~~~~~~~~

The original u8* was done in the hope to make this more adaptable but
consensus is to keep this like it is in tc pedit.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Laura Abbott
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2019-12-10 03:07:59 +0800
c593642c8 treewide: Use sizeof_field() macro ... Browse Code »

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Acked-by: David Miller # for net

Pankaj Bharadiya
2019-12-10 02:36:44 +0800
f8fc57e8d net/x25: add new state X25_STATE_5 ... Browse Code »

This is needed, because if the flag X25_ACCPT_APPRV_FLAG is not set on a
socket (manual call confirmation) and the channel is cleared by remote
before the manual call confirmation was sent, this situation needs to
be handled.

Signed-off-by: Martin Schiller
Signed-off-by: David S. Miller

Martin Schiller
2019-12-10 02:28:43 +0800
b6f3320b1 sctp: fully initialize v4 addr in some functions ... Browse Code »

Syzbot found a crash:

BUG: KMSAN: uninit-value in crc32_body lib/crc32.c:112 [inline]
BUG: KMSAN: uninit-value in crc32_le_generic lib/crc32.c:179 [inline]
BUG: KMSAN: uninit-value in __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
Call Trace:
crc32_body lib/crc32.c:112 [inline]
crc32_le_generic lib/crc32.c:179 [inline]
__crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
chksum_update+0xb2/0x110 crypto/crc32c_generic.c:90
crypto_shash_update+0x4c5/0x530 crypto/shash.c:107
crc32c+0x150/0x220 lib/libcrc32c.c:47
sctp_csum_update+0x89/0xa0 include/net/sctp/checksum.h:36
__skb_checksum+0x1297/0x12a0 net/core/skbuff.c:2640
sctp_compute_cksum include/net/sctp/checksum.h:59 [inline]
sctp_packet_pack net/sctp/output.c:528 [inline]
sctp_packet_transmit+0x40fb/0x4250 net/sctp/output.c:597
sctp_outq_flush_transports net/sctp/outqueue.c:1146 [inline]
sctp_outq_flush+0x1823/0x5d80 net/sctp/outqueue.c:1194
sctp_outq_uncork+0xd0/0xf0 net/sctp/outqueue.c:757
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1781 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1184 [inline]
sctp_do_sm+0x8fe1/0x9720 net/sctp/sm_sideeffect.c:1155
sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:185
sctp_apply_peer_addr_params+0x212/0x1d40 net/sctp/socket.c:2433
sctp_setsockopt_peer_addr_params net/sctp/socket.c:2686 [inline]
sctp_setsockopt+0x189bb/0x19090 net/sctp/socket.c:4672

The issue was caused by transport->ipaddr set with uninit addr param, which
was passed by:

sctp_transport_init net/sctp/transport.c:47 [inline]
sctp_transport_new+0x248/0xa00 net/sctp/transport.c:100
sctp_assoc_add_peer+0x5ba/0x2030 net/sctp/associola.c:611
sctp_process_param net/sctp/sm_make_chunk.c:2524 [inline]

where 'addr' is set by sctp_v4_from_addr_param(), and it doesn't initialize
the padding of addr->v4.

Later when calling sctp_make_heartbeat(), hbinfo.daddr(=transport->ipaddr)
will become the part of skb, and the issue occurs.

This patch is to fix it by initializing the padding of addr->v4 in
sctp_v4_from_addr_param(), as well as other functions that do the similar
thing, and these functions shouldn't trust that the caller initializes the
memory, as Marcelo suggested.

Reported-by: syzbot+6dcbfea81cd3d4dd0b02@syzkaller.appspotmail.com
Signed-off-by: Xin Long
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Xin Long
2019-12-10 02:16:39 +0800