06 Sep, 2019
9 commits
-
[ Upstream commit e858faf556d4e14c750ba1e8852783c6f9520a0e ]
If an app is playing tricks to reuse a socket via tcp_disconnect(),
bytes_acked/received needs to be reset to 0. Otherwise tcp_info will
report the sum of the current and the old connection..Cc: Eric Dumazet
Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: bdd1f9edacb5 ("tcp: add tcpi_bytes_received to tcp_info")
Signed-off-by: Christoph Paasch
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit 1b200acde418f4d6d87279d3f6f976ebf188f272) -
[ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ]
Neal reported incorrect use of ns_capable() from bpf hook.
bpf_setsockopt(...TCP_CONGESTION...)
-> tcp_set_congestion_control()
-> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
-> ns_capable_common()
-> current_cred()
-> rcu_dereference_protected(current->cred, 1)Accessing 'current' in bpf context makes no sense, since packets
are processed from softirq context.As Neal stated : The capability check in tcp_set_congestion_control()
was written assuming a system call context, and then was reused from
a BPF call site.The fix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet
Cc: Lawrence Brakmo
Reported-by: Neal Cardwell
Acked-by: Neal Cardwell
Acked-by: Lawrence Brakmo
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit c60f57dfe995172c2f01e59266e3ffa3419c6cd9) -
[ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ]
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet
Reported-by: Andrew Prout
Tested-by: Andrew Prout
Tested-by: Jonathan Lemon
Tested-by: Michal Kubecek
Acked-by: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Christoph Paasch
Cc: Jonathan Looney
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit 6323c238bb4374d1477348cfbd5854f2bebe9a21) -
commit b6653b3629e5b88202be3c9abc44713973f5c4b4 upstream.
tcp_fragment() might be called for skbs in the write queue.
Memory limits might have been exceeded because tcp_sendmsg() only
checks limits at full skb (64KB) boundaries.Therefore, we need to make sure tcp_fragment() wont punish applications
that might have setup very low SO_SNDBUF values.Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet
Reported-by: Christoph Paasch
Tested-by: Christoph Paasch
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman(cherry picked from commit dad3a9314ac95dedc007bc7dacacb396ea10e376)
-
commit 967c05aee439e6e5d7d805e195b3a20ef5c433d6 upstream.
If mtu probing is enabled tcp_mtu_probing() could very well end up
with a too small MSS.Use the new sysctl tcp_min_snd_mss to make sure MSS search
is performed in an acceptable range.CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet
Reported-by: Jonathan Lemon
Cc: Jonathan Looney
Acked-by: Neal Cardwell
Cc: Yuchung Cheng
Cc: Tyler Hicks
Cc: Bruce Curtis
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit 59222807fcc99951dc769cd50e132e319d73d699) -
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream.
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.This forces the stack to send packets with a very high network/cpu
overhead.Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.In some cases, it can be useful to increase the minimal value
to a saner value.We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet
Suggested-by: Jonathan Looney
Acked-by: Neal Cardwell
Cc: Yuchung Cheng
Cc: Tyler Hicks
Cc: Bruce Curtis
Cc: Jonathan Lemon
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit 7f9f8a37e563c67b24ccd57da1d541a95538e8d9) -
commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream.
Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
socket is already using more than half the allowed spaceSigned-off-by: Eric Dumazet
Reported-by: Jonathan Looney
Acked-by: Neal Cardwell
Acked-by: Yuchung Cheng
Reviewed-by: Tyler Hicks
Cc: Bruce Curtis
Cc: Jonathan Lemon
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit ec83921899a571ad70d582934ee9e3e07f478848) -
commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream.
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet
Reported-by: Jonathan Looney
Acked-by: Neal Cardwell
Reviewed-by: Tyler Hicks
Cc: Yuchung Cheng
Cc: Bruce Curtis
Cc: Jonathan Lemon
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit c09be31461ed140976c60a87364415454a2c3d42) -
[ Upstream commit 50ce163a72d817a99e8974222dcf2886d5deb1ae ]
For some reason, tcp_grow_window() correctly tests if enough room
is present before attempting to increase tp->rcv_ssthresh,
but does not prevent it to grow past tcp_space()This is causing hard to debug issues, like failing
the (__tcp_select_window(sk) >= tp->rcv_wnd) test
in __tcp_ack_snd_check(), causing ACK delays and possibly
slow flows.Depending on tcp_rmem[2], MTU, skb->len/skb->truesize ratio,
we can see the problem happening on "netperf -t TCP_RR -- -r 2000,2000"
after about 60 round trips, when the active side no longer sends
immediate acks.This bug predates git history.
Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Acked-by: Neal Cardwell
Acked-by: Wei Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
(cherry picked from commit 6728c6174a47b8a04ceec89aca9e1195dee7ff6b)
02 May, 2019
2 commits
-
Add security setting check for socket interface since
stack will check the return value.Signed-off-by: Fugang Duan
Signed-off-by: Marcel Holtmann
Signed-off-by: Arulpandiyan Vadivel
Signed-off-by: Shrikant Bobade
(cherry picked from commit a23098e20ce214d134cc3cef3ad51a6bba13b99a) -
The commit ae97fd867aa3 ("MLK-19091 cfg80211: make phy index match
after wiphy dev is released") manage wiphy_counter matching between
creating and freeing wiphy device. Then for one wifi instance, the index
of attached phy is not changed during loadable test. But it ignores
multiple wifi cards loadable test case, that introduces the phy index
confliction. So the patch revert the commit.Reviewed-by: Richard Zhu
Signed-off-by: Fugang Duan
Signed-off-by: Arulpandiyan Vadivel
Signed-off-by: Shrikant Bobade
(cherry picked from commit 4bfe854b650f1e8bc46624d5f3b559f0112f327a)
18 Apr, 2019
2 commits
-
During insmod/rmmod test, the phy index increases that cause troube
for test case. To make global variable wiphy_counter match between
creat and free wiphy device, it needs to decrease the atomic counter
when wiphy device is freed.Reviewed-by: Richard Zhu
Signed-off-by: Fugang Duan
Signed-off-by: Vipul Kumar -
[Patch] Pulling the following commits and some general changes
from custom v3.10 kernel for supporting qcacld2.0 on kernel v4.9.11.
1. cfg80211: Using new wiphy flag WIPHY_FLAG_DFS_OFFLOAD
When flag WIPHY_FLAG_DFS_OFFLOAD is defined, the driver would handle
all the DFS related operations. Therefore the kernel needs to ignore
the DFS state that it uses to block the userspace calls to the driver
through cfg80211 APIs. Also it should treat the userspace calls to
start radar detection as a no-op.Please note that changes in util.c is not picked up explicitly.
Kernel v4.9.11 uses wrapper cfg80211_get_chans_dfs_required which takes
care of this change.Change-Id: I9dd2076945581ca67e54dfc96dd3dbc526c6f0a2
IRs-Fixed: 2026862. New db.txt from git/sforshee/wireless-regdb.git
CONFIG_CFG80211_INTERNAL_REGDB is enabled in build. This causes
kernel warn messages as db.txt is empty. A new db.txt is added
from:
git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/wireless-regdb.gitIRs-Fixed: 202686
3. Picked up the declaration and definition of the function
cfg80211_is_gratuitous_arp_unsolicited_naChange-Id: I1e4083a2327c121073226aa6b75bb6b5b97cec00
CRs-fixed: 1079453Signed-off-by: Nakul Kachhwaha
Signed-off-by: Fugang Duan
(Vipul: Fixed merge conflicts)
(TODO: checkpatch warnings)
Signed-off-by: Vipul Kumar
17 Apr, 2019
20 commits
-
commit 89259088c1b7fecb43e8e245dc931909132a4e03 upstream
syzbot was able to trigger the WARN in cttimeout_default_get() by
passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use
same timeout values.Furthermore, also fetch GRE timeouts. GRE is a bit more complicated,
as it still can be a module and its netns_proto_gre struct layout isn't
visible outside of the gre module. Can't move timeouts around, it
appears conntrack sysctl unregister assumes net_generic() returns
nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.A followup nf-next patch could make gre tracker be built-in as well
if needed, its not that large.Last, make the WARN() mention the missing protocol value in case
anything else is missing.Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Zubin Mithra
Signed-off-by: Sasha Levin (Microsoft) -
commit 8866df9264a34e675b4ee8a151db819b87cce2d3 upstream
Otherwise, we hit a NULL pointer deference since handlers always assume
default timeout policy is passed.netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297Fixes: c779e849608a ("netfilter: conntrack: remove get_timeout() indirection")
Reported-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Zubin Mithra
Signed-off-by: Sasha Levin (Microsoft) -
[ Upstream commit 9a5a90d167b0e5fe3d47af16b68fd09ce64085cd ]
__netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
setup) crashes like:[ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
[ 88.618666] Oops[#1]:
[ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
[ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
[ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
[ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
[ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
[ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
[ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
[ 88.665793] $24 : 00000000 8011e658
[ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
[ 88.677427] Hi : 00000003
[ 88.680622] Lo : 7b5b4220
[ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
[ 88.696734] Status: 10000404 IEp
[ 88.700422] Cause : 50000008 (ExcCode 02)
[ 88.704874] BadVA : 0000000e
[ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
[ 88.713005] Modules linked in:
[ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
[ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
[ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
[ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
[ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
[ 88.771770] ...
[ 88.774483] Call Trace:
[ 88.777199] [] vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.783504] [] dev_hard_start_xmit+0xac/0x188
[ 88.789326] [] __dev_queue_xmit+0x6e8/0x7d4
[ 88.794955] [] ip_finish_output2+0x238/0x4d0
[ 88.800677] [] ip_output+0xc8/0x140
[ 88.805526] [] ip_forward+0x364/0x560
[ 88.810567] [] ip_rcv+0x48/0xe4
[ 88.815030] [] __netif_receive_skb_one_core+0x44/0x58
[ 88.821635] [] dsa_switch_rcv+0x108/0x1ac
[ 88.827067] [] __netif_receive_skb_list_core+0x228/0x26c
[ 88.833951] [] netif_receive_skb_list+0x1d4/0x394
[ 88.840160] [] lunar_rx_poll+0x38c/0x828
[ 88.845496] [] net_rx_action+0x14c/0x3cc
[ 88.850835] [] __do_softirq+0x178/0x338
[ 88.856077] [] irq_exit+0xbc/0x100
[ 88.860846] [] plat_irq_dispatch+0xc0/0x144
[ 88.866477] [] handle_int+0x14c/0x158
[ 88.871516] [] r4k_wait+0x30/0x40
[ 88.876462] Code: afb10014 8c8200a0 00803025 94a20468 00000000 10620042 00a08025 9605046a
[ 88.887332]
[ 88.888982] ---[ end trace eb863d007da11cf1 ]---
[ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
[ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---Fix this by pulling skb off the sublist and zeroing skb->next pointer
before calling ptype callback.Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup")
Reviewed-by: Edward Cree
Signed-off-by: Alexander Lobakin
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb ]
erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove
erspan header. This can determine a possible use-after-free accessing
pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned'
running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
been sent though a veth device). Fix it resetting pkt_md pointer after
__iptunnel_pull_headerFixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code")
Signed-off-by: Lorenzo Bianconi
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 492b67e28ee5f2a2522fb72e3d3bcb990e461514 ]
erspan tunnels run __iptunnel_pull_header on received skbs to remove
gre and erspan headers. This can determine a possible use-after-free
accessing pkt_md pointer in erspan_rcv since the packet will be 'uncloned'
running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
been sent though a veth device). Fix it resetting pkt_md pointer after
__iptunnel_pull_headerFixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code")
Signed-off-by: Lorenzo Bianconi
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.v2->v3:
- Simplify by passing the original netdevice down the stack (per David
Ahern).Signed-off-by: Stephen Suryaputra
Reviewed-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit b506bc975f60f06e13e74adb35e708a23dc4e87c ]
When tcp_sk_init() failed in inet_ctl_sock_create(),
'net->ipv4.tcp_congestion_control' will be left
uninitialized, but tcp_sk_exit() hasn't check for
that.This patch add checking on 'net->ipv4.tcp_congestion_control'
in tcp_sk_exit() to prevent NULL-ptr dereference.Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control")
Signed-off-by: Dust Li
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.| Link | RTT | Drop
TCP CC | speed | base+AQM | probability
==================|=========|==========|============
CUBIC | 40Mbps | 7+20ms | 0.21%
RENO | | | 0.19%
DCTCP-CLAMP-ALPHA | | | 25.80%
DCTCP-HALVE-CWND | | | 0.22%
------------------|---------|----------|------------
CUBIC | 100Mbps | 7+20ms | 0.03%
RENO | | | 0.02%
DCTCP-CLAMP-ALPHA | | | 23.30%
DCTCP-HALVE-CWND | | | 0.04%
------------------|---------|----------|------------
CUBIC | 800Mbps | 1+1ms | 0.04%
RENO | | | 0.05%
DCTCP-CLAMP-ALPHA | | | 18.70%
DCTCP-HALVE-CWND | | | 0.06%We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.Signed-off-by: Koen De Schepper
Signed-off-by: Olivier Tilmans
Cc: Bob Briscoe
Cc: Lawrence Brakmo
Cc: Florian Westphal
Cc: Daniel Borkmann
Cc: Yuchung Cheng
Cc: Neal Cardwell
Cc: Eric Dumazet
Cc: Andrew Shewmaker
Cc: Glenn Judd
Acked-by: Florian Westphal
Acked-by: Neal Cardwell
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
Call Trace:
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
copy_to_user include/linux/uaccess.h:174 [inline]
sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
...
Uninit was stored to memory at:
sctp_transport_init net/sctp/transport.c:61 [inline]
sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
...
Bytes 8-15 of 16 are uninitializedIt was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
struct sockaddr_in) wasn't initialized, but directly copied to user memory
in sctp_getsockopt_peer_addrs().So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
sctp_v6_addr_to_user() does.Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com
Signed-off-by: Xin Long
Tested-by: Alexander Potapenko
Acked-by: Neil Horman
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
The flow action buffer can be resized if it's not big enough to contain
all the requested flow actions. However, this resize doesn't take into
account the new requested size, the buffer is only increased by a factor
of 2x. This might be not enough to contain the new data, causing a
buffer overflow, for example:[ 42.044472] =============================================================================
[ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
[ 42.046415] -----------------------------------------------------------------------------[ 42.047715] Disabling lock debugging due to kernel taint
[ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
[ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101
[ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb[ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........
[ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l...
[ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x...
[ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ...............
[ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 ....
[ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZFix by making sure the new buffer is properly resized to contain all the
requested data.BugLink: https://bugs.launchpad.net/bugs/1813244
Signed-off-by: Andrea Righi
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]
It returned always NULL, thus it was never possible to get the filter.
Example:
$ ip link add foo type dummy
$ ip link add bar type dummy
$ tc qdisc add dev foo clsact
$ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
matchall action mirred ingress mirror dev barBefore the patch:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
Error: Specified filter handle not found.
We have an error talking to the kernelAfter:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
not_in_hw
action order 1: mirred (Ingress Mirror to device bar) pipe
index 1 ref 1 bind 1CC: Yotam Gigi
CC: Jiri Pirko
Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race")
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]
the control path of 'sample' action does not validate the value of 'rate'
provided by the user, but then it uses it as divisor in the traffic path.
Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
message in case that value is zero, to fix a splat with the script below:# tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
# tc -s a s action sample
total acts 1action order 0: sample rate 1/0 group 1 pipe
index 2 ref 1 bind 1 installed 19 sec used 19 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
# ping 192.0.2.1 -I test0 -c1 -qdivide error: 0000 [#1] SMP PTI
CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
Call Trace:
tcf_action_exec+0x7c/0x1c0
tcf_classify+0x57/0x160
__dev_queue_xmit+0x3dc/0xd10
ip_finish_output2+0x257/0x6d0
ip_output+0x75/0x280
ip_send_skb+0x15/0x40
raw_sendmsg+0xae3/0x1410
sock_sendmsg+0x36/0x40
__sys_sendto+0x10e/0x140
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x60/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[...]
Kernel panic - not syncing: Fatal exception in interruptAdd a TDC selftest to document that 'rate' is now being validated.
Reported-by: Matteo Croce
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Signed-off-by: Davide Caratti
Acked-by: Yotam Gigi
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
When it is to cleanup net namespace, rds_tcp_exit_net() will call
rds_tcp_kill_sock(), if t_sock is NULL, it will not call
rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
and reference 'net' which has already been freed.In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
sock->ops->connect, but if connect() is failed, it will call
rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
failed, rds_connect_worker() will try to reconnect all the time, so
rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
connections.Therefore, the condition !tc->t_sock is not needed if it is going to do
cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
NULL, and there is on other path to cancel cp_conn_w and free
connection. So this patch is to fix this.rds_tcp_kill_sock():
...
if (net != c_net || !tc->t_sock)
...
Acked-by: Santosh Shilimkar==================================================================
BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
net/ipv4/af_inet.c:340
Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
Hardware name: linux,dummy-virt (DT)
Workqueue: krdsd rds_connect_worker
Call trace:
dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x120/0x188 lib/dump_stack.c:113
print_address_description+0x68/0x278 mm/kasan/report.c:253
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x21c/0x348 mm/kasan/report.c:409
__asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
__sock_create+0x4f8/0x770 net/socket.c:1276
sock_create_kern+0x50/0x68 net/socket.c:1322
rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117Allocated by task 687:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2705 [inline]
slab_alloc mm/slub.c:2713 [inline]
kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
kmem_cache_zalloc include/linux/slab.h:697 [inline]
net_alloc net/core/net_namespace.c:384 [inline]
copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
ksys_unshare+0x340/0x628 kernel/fork.c:2577
__do_sys_unshare kernel/fork.c:2645 [inline]
__se_sys_unshare kernel/fork.c:2643 [inline]
__arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960Freed by task 264:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
slab_free_hook mm/slub.c:1370 [inline]
slab_free_freelist_hook mm/slub.c:1397 [inline]
slab_free mm/slub.c:2952 [inline]
kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
net_free net/core/net_namespace.c:400 [inline]
net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
net_drop_ns net/core/net_namespace.c:406 [inline]
cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117The buggy address belongs to the object at ffff8003496a3f80
which belongs to the cache net_namespace of size 7872
The buggy address is located 1796 bytes inside of
7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
The buggy address belongs to the page:
page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
index:0x0 compound_mapcount: 0
flags: 0xffffe0000008100(slab|head)
raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detectedMemory state around the buggy address:
ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Reported-by: Hulk Robot
Signed-off-by: Mao Wenan
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet
Reported-by: Amit Klein
Reported-by: Benny Pinkas
Cc: Pavel Emelyanov
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
Currently we may merge incorrectly a received GSO packet
or a packet with frag_list into a packet sitting in the
gro_hash list. skb_segment() may crash case because
the assumptions on the skb layout are not met.
The correct behaviour would be to flush the packet in the
gro_hash list and send the received GSO packet directly
afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders")
sets NAPI_GRO_CB(skb)->flush in this case, but this is not
checked before merging. This patch makes sure to check this
flag and to not merge in that case.Fixes: d61d072e87c8e ("net-gro: avoid reorders")
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
NULL or ZERO_SIZE_PTR will be returned for zero sized memory
request, and derefencing them will lead to a segfaultso it is unnecessory to call vzalloc for zero sized memory
request and not call functions which maybe derefence the
NULL allocated memorythis also fixes a possible memory leak if phy_ethtool_get_stats
returns error, memory should be freed before exitSigned-off-by: Li RongQing
Reviewed-by: Wang Li
Reviewed-by: Michal Kubecek
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
When kcm is loaded while many processes try to create a KCM socket, a
crash occurs:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
Oops: 0002 [#1] SMP KASAN PTI
CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
...
CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
Call Trace:
kcm_create+0x600/0xbf0 [kcm]
__sock_create+0x324/0x750 net/socket.c:1272
...This is due to race between sock_create and unfinished
register_pernet_device. kcm_create tries to do "net_generic(net,
kcm_net_id)". but kcm_net_id is not initialized yet.So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
and one process doing module removal.Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reviewed-by: Michal Kubecek
Signed-off-by: Jiri Slaby
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
ipip6 tunnels run iptunnel_pull_header on received skbs. This can
determine the following use-after-free accessing iph pointer since
the packet will be 'uncloned' running pskb_expand_head if it is a
cloned gso skb (e.g if the packet has been sent though a veth device)[ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
[ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
[ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
[ 706.771839] Call trace:
[ 706.801159] dump_backtrace+0x0/0x2f8
[ 706.845079] show_stack+0x24/0x30
[ 706.884833] dump_stack+0xe0/0x11c
[ 706.925629] print_address_description+0x68/0x260
[ 706.982070] kasan_report+0x178/0x340
[ 707.025995] __asan_report_load1_noabort+0x30/0x40
[ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit]
[ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4]
[ 707.185940] ip_local_deliver_finish+0x3b8/0x988
[ 707.241338] ip_local_deliver+0x144/0x470
[ 707.289436] ip_rcv_finish+0x43c/0x14b0
[ 707.335447] ip_rcv+0x628/0x1138
[ 707.374151] __netif_receive_skb_core+0x1670/0x2600
[ 707.432680] __netif_receive_skb+0x28/0x190
[ 707.482859] process_backlog+0x1d0/0x610
[ 707.529913] net_rx_action+0x37c/0xf68
[ 707.574882] __do_softirq+0x288/0x1018
[ 707.619852] run_ksoftirqd+0x70/0xa8
[ 707.662734] smpboot_thread_fn+0x3a4/0x9e8
[ 707.711875] kthread+0x2c8/0x350
[ 707.750583] ret_from_fork+0x10/0x18[ 707.811302] Allocated by task 16982:
[ 707.854182] kasan_kmalloc.part.1+0x40/0x108
[ 707.905405] kasan_kmalloc+0xb4/0xc8
[ 707.948291] kasan_slab_alloc+0x14/0x20
[ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0
[ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0
[ 708.108280] __alloc_skb+0xd8/0x400
[ 708.150139] sk_stream_alloc_skb+0xa4/0x638
[ 708.200346] tcp_sendmsg_locked+0x818/0x2b90
[ 708.251581] tcp_sendmsg+0x40/0x60
[ 708.292376] inet_sendmsg+0xf0/0x520
[ 708.335259] sock_sendmsg+0xac/0xf8
[ 708.377096] sock_write_iter+0x1c0/0x2c0
[ 708.424154] new_sync_write+0x358/0x4a8
[ 708.470162] __vfs_write+0xc4/0xf8
[ 708.510950] vfs_write+0x12c/0x3d0
[ 708.551739] ksys_write+0xcc/0x178
[ 708.592533] __arm64_sys_write+0x70/0xa0
[ 708.639593] el0_svc_handler+0x13c/0x298
[ 708.686646] el0_svc+0x8/0xc[ 708.739019] Freed by task 17:
[ 708.774597] __kasan_slab_free+0x114/0x228
[ 708.823736] kasan_slab_free+0x10/0x18
[ 708.868703] kfree+0x100/0x3d8
[ 708.905320] skb_free_head+0x7c/0x98
[ 708.948204] skb_release_data+0x320/0x490
[ 708.996301] pskb_expand_head+0x60c/0x970
[ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0
[ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit]
[ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4]
[ 709.200195] ip_local_deliver_finish+0x3b8/0x988
[ 709.255596] ip_local_deliver+0x144/0x470
[ 709.303692] ip_rcv_finish+0x43c/0x14b0
[ 709.349705] ip_rcv+0x628/0x1138
[ 709.388413] __netif_receive_skb_core+0x1670/0x2600
[ 709.446943] __netif_receive_skb+0x28/0x190
[ 709.497120] process_backlog+0x1d0/0x610
[ 709.544169] net_rx_action+0x37c/0xf68
[ 709.589131] __do_softirq+0x288/0x1018[ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580
which belongs to the cache kmalloc-1024 of size 1024
[ 709.804356] The buggy address is located 117 bytes inside of
1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
[ 709.946340] The buggy address belongs to the page:
[ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
[ 710.099914] flags: 0xfffff8000000100(slab)
[ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
[ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
[ 710.334966] page dumped because: kasan: bad access detectedFix it resetting iph pointer after iptunnel_pull_header
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Tested-by: Jianlin Shi
Signed-off-by: Lorenzo Bianconi
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
At the beginning of ip6_fragment func, the prevhdr pointer is
obtained in the ip6_find_1stfragopt func.
However, all the pointers pointing into skb header may change
when calling skb_checksum_help func with
skb->ip_summed = CHECKSUM_PARTIAL condition.
The prevhdr pointe will be dangling if it is not reloaded after
calling __skb_linearize func in skb_checksum_help func.Here, I add a variable, nexthdr_offset, to evaluate the offset,
which does not changes even after calling __skb_linearize func.Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
Signed-off-by: Junwei Hu
Reported-by: Wenhao Zhang
Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
Reviewed-by: Zhiqiang Liu
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
The device type for ip6 tunnels is set to
ARPHRD_TUNNEL6. However, the ip4ip6_err function
is expecting the device type of the tunnel to be
ARPHRD_TUNNEL. Since the device types do not
match, the function exits and the ICMP error
packet is not sent to the originating host. Note
that the device type for IPv4 tunnels is set to
ARPHRD_TUNNEL.Fix is to expect a tunnel device type of
ARPHRD_TUNNEL6 instead. Now the tunnel device
type matches and the ICMP error packet is sent
to the originating host.Signed-off-by: Sheena Mira-ato
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
06 Apr, 2019
4 commits
-
[ Upstream commit 8e2f311a68494a6677c1724bdcb10bada21af37c ]
Following command:
iptables -D FORWARD -m physdev ...
causes connectivity loss in some setups.Reason is that iptables userspace will probe kernel for the module revision
of the physdev patch, and physdev has an artificial dependency on
br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
is loaded).This causes the "phydev" module to be loaded, which in turn enables the
"call-iptables" infrastructure.bridged packets might then get dropped by the iptables ruleset.
The better fix would be to change the "call-iptables" defaults to 0 and
enforce explicit setting to 1, but that breaks backwards compatibility.This does the next best thing: add a request_module call to checkentry.
This was a stray '-D ... -m physdev' won't activate br_netfilter
anymore.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin -
[ Upstream commit 13f5251fd17088170c18844534682d9cab5ff5aa ]
For bridge(br_flood) or broadcast/multicast packets, they could clone
skb with unconfirmed conntrack which break the rule that unconfirmed
skb->_nfct is never shared. With nfqueue running on my system, the race
can be easily reproduced with following warning calltrace:[13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P W 4.4.60 #7744
[13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
[13257.714700] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[13257.720253] [] (show_stack) from [] (dump_stack+0x94/0xa8)
[13257.728240] [] (dump_stack) from [] (warn_slowpath_common+0x94/0xb0)
[13257.735268] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
[13257.743519] [] (warn_slowpath_null) from [] (__nf_conntrack_confirm+0xa8/0x618)
[13257.752284] [] (__nf_conntrack_confirm) from [] (ipv4_confirm+0xb8/0xfc)
[13257.761049] [] (ipv4_confirm) from [] (nf_iterate+0x48/0xa8)
[13257.769725] [] (nf_iterate) from [] (nf_hook_slow+0x30/0xb0)
[13257.777108] [] (nf_hook_slow) from [] (br_nf_post_routing+0x274/0x31c)
[13257.784486] [] (br_nf_post_routing) from [] (nf_iterate+0x48/0xa8)
[13257.792556] [] (nf_iterate) from [] (nf_hook_slow+0x30/0xb0)
[13257.800458] [] (nf_hook_slow) from [] (br_forward_finish+0x94/0xa4)
[13257.808010] [] (br_forward_finish) from [] (br_nf_forward_finish+0x150/0x1ac)
[13257.815736] [] (br_nf_forward_finish) from [] (nf_reinject+0x108/0x170)
[13257.824762] [] (nf_reinject) from [] (nfqnl_recv_verdict+0x3d8/0x420)
[13257.832924] [] (nfqnl_recv_verdict) from [] (nfnetlink_rcv_msg+0x158/0x248)
[13257.841256] [] (nfnetlink_rcv_msg) from [] (netlink_rcv_skb+0x54/0xb0)
[13257.849762] [] (netlink_rcv_skb) from [] (netlink_unicast+0x148/0x23c)
[13257.858093] [] (netlink_unicast) from [] (netlink_sendmsg+0x2ec/0x368)
[13257.866348] [] (netlink_sendmsg) from [] (sock_sendmsg+0x34/0x44)
[13257.874590] [] (sock_sendmsg) from [] (___sys_sendmsg+0x1ec/0x200)
[13257.882489] [] (___sys_sendmsg) from [] (__sys_sendmsg+0x3c/0x64)
[13257.890300] [] (__sys_sendmsg) from [] (ret_fast_syscall+0x0/0x34)The original code just triggered the warning but do nothing. It will
caused the shared conntrack moves to the dying list and the packet be
droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack).- Reproduce steps:
+----------------------------+
| br0(bridge) |
| |
+-+---------+---------+------+
| eth0| | eth1| | eth2|
| | | | | |
+--+--+ +--+--+ +---+-+
| | |
| | |
+--+-+ +-+--+ +--+-+
| PC1| | PC2| | PC3|
+----+ +----+ +----+iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass
ps: Our nfq userspace program will set mark on packets whose connection
has already been processed.PC1 sends broadcast packets simulated by hping3:
hping3 --rand-source --udp 192.168.1.255 -i u100
- Broadcast racing flow chart is as follow:
br_handle_frame
BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
// skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
br_handle_frame_finish
// check if this packet is broadcast
br_flood_forward
br_flood
list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port
maybe_deliver
deliver_clone
skb = skb_clone(skb)
__br_forward
BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
// queue in our nfq and received by our userspace program
// goto __nf_conntrack_confirm with process context on CPU 1
br_pass_frame_up
BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
// goto __nf_conntrack_confirm with softirq context on CPU 0Because conntrack confirm can happen at both INPUT and POSTROUTING
stage. So with NFQUEUE running, skb->_nfct with the same unconfirmed
conntrack could race on different core.This patch fixes a repeating kernel splat, now it is only displayed
once.Signed-off-by: Chieh-Min Wang
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin -
[ Upstream commit be0502a3f2e94211a8809a09ecbc3a017189b8fb ]
TCP resets cause instant transition from established to closed state
provided the reset is in-window. Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.Main problem for conntrack is that its a middlebox, i.e. whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).Therefore we can't simply flag RSTs with non-exact match as invalid.
This updates RST processing as follows:
1. If the connection is in a state other than ESTABLISHED, nothing is
changed, RST is subject to normal in-window check.2. If the RSTs sequence number either matches exactly RCV.NXT,
connection state moves to CLOSE.3. The same applies if the RST sequence number aligns with a previous
packet in the same direction.In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.If the peer sends a challenge ack, connection timeout will be reset.
If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.Packetdrill test case:
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 00.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1 win 64240
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R 2:2(0) ack 1001 win 260// 2nd reset, but too far ahead sequence. Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260// ACK
0.300 < . 1001:1001(0) ack 1001 win 257// more data could be exchanged here, connection
// is still established// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002// Close the connection without reading outstanding data
0.700 close(4) = 0// so one more reset. Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.With patch, this generates following conntrack events:
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]Cc: Jozsef Kadlecsik
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin -
[ Upstream commit a9f5e78c403d2d62ade4f4c85040efc85f4049b8 ]
Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.Thanks for Eric's help to finish this patch.
Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet
Signed-off-by: Zhang Yu
Signed-off-by: Li RongQing
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin
03 Apr, 2019
3 commits
-
[ Upstream commit 064c5d6881e897077639e04973de26440ee205e6 ]
A new mirred action is created by the tcf_mirred_init function. This
contains a list head struct which is inserted into a global list on
successful creation of a new action. However, after a creation, it is
still possible to error out and call the tcf_idr_release function. This,
in turn, calls the act_mirr cleanup function via __tcf_idr_release and
__tcf_action_put. This cleanup function tries to delete the list entry
which is as yet uninitialised, leading to a NULL pointer exception.Fix this by initialising the list entry on creation of a new action.
Bug report:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 32 PID: 5636 Comm: handler194 Tainted: G OE 5.0.0+ #186
Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
FS: 00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
Call Trace:
tcf_action_cleanup+0x59/0xca
__tcf_action_put+0x54/0x6b
__tcf_idr_release.cold.33+0x9/0x12
tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
tcf_action_init_1+0x3d0/0x4c0
tcf_action_init+0x9c/0x130
tcf_exts_validate+0xab/0xc0
fl_change+0x1ca/0x982 [cls_flower]
tc_new_tfilter+0x647/0x8d0
? load_balance+0x14b/0x9e0
rtnetlink_rcv_msg+0xe3/0x370
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1d4/0x2b0
? rtnl_calcit.isra.31+0xf0/0xf0
netlink_rcv_skb+0x49/0x110
netlink_unicast+0x16f/0x210
netlink_sendmsg+0x1df/0x390
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x27b/0x2c0
? futex_wake+0x80/0x140
? do_futex+0x2b9/0xac0
? ep_scan_ready_list.constprop.22+0x1f2/0x210
? ep_poll+0x7a/0x430
__sys_sendmsg+0x47/0x80
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Signed-off-by: John Hurley
Reviewed-by: Jakub Kicinski
Acked-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit b5f9bd15b88563b55a99ed588416881367a0ce5f ]
ila_xlat_nl_cmd_flush uses rhashtable walkers allocated from the
stack but it never frees them. This corrupts the walker list of
the hash table.This patch fixes it.
Reported-by: syzbot+dae72a112334aa65a159@syzkaller.appspotmail.com
Fixes: b6e71bdebb12 ("ila: Flush netlink command to clear xlat...")
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 33872d79f5d1cbedaaab79669cc38f16097a9450 ]
When cancelling a subscription, we have to clear the cancel bit in the
request before iterating over any established subscriptions with memcmp.
Otherwise no subscription will ever be found, and it will not be
possible to explicitly unsubscribe individual subscriptions.Fixes: 8985ecc7c1e0 ("tipc: simplify endianness handling in topology subscriber")
Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman