08 Mar, 2020
1 commit
-
Merge Linux stable release v5.4.24 into imx_5.4.y
* tag 'v5.4.24': (3306 commits)
Linux 5.4.24
blktrace: Protect q->blk_trace with RCU
kvm: nVMX: VMWRITE checks unsupported field before read-only field
...Signed-off-by: Jason Liu
Conflicts:
arch/arm/boot/dts/imx6sll-evk.dts
arch/arm/boot/dts/imx7ulp.dtsi
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
drivers/clk/imx/clk-composite-8m.c
drivers/gpio/gpio-mxc.c
drivers/irqchip/Kconfig
drivers/mmc/host/sdhci-of-esdhc.c
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
drivers/net/can/flexcan.c
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
drivers/net/ethernet/mscc/ocelot.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
drivers/net/phy/realtek.c
drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
drivers/perf/fsl_imx8_ddr_perf.c
drivers/tee/optee/shm_pool.c
drivers/usb/cdns3/gadget.c
kernel/sched/cpufreq.c
net/core/xdp.c
sound/soc/fsl/fsl_esai.c
sound/soc/fsl/fsl_sai.c
sound/soc/sof/core.c
sound/soc/sof/imx/Kconfig
sound/soc/sof/loader.c
05 Mar, 2020
22 commits
-
commit cf3e204a1ca5442190018a317d9ec181b4639bd6 upstream.
info->key.tp_src and tp_dst are __be16, when using nla_put_be16()
to dump them, htons() is not needed, so remove it in this patch.Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Xin Long
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman -
commit 369537c97024dca99303a8d4d6ab38b4f54d3909 upstream.
Just SMCR requires a CLC Peer ID, but not SMCD. The field should be
zero for SMCD.Fixes: c758dfddc1b5 ("net/smc: add SMC-D support in CLC messages")
Signed-off-by: Ursula Braun
Signed-off-by: Karsten Graul
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
commit 3a20773beeeeadec41477a5ba872175b778ff752 upstream.
Since nl_groups is a u32 we can't bind more groups via ->bind
(netlink_bind) call, but netlink has supported more groups via
setsockopt() for a long time and thus nlk->ngroups could be over 32.
Recently I added support for per-vlan notifications and increased the
groups to 33 for NETLINK_ROUTE which exposed an old bug in the
netlink_bind() code causing out-of-bounds access on archs where unsigned
long is 32 bits via test_bit() on a local variable. Fix this by capping the
maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively
capping them at 32 which is the minimum of allocated groups and the
maximum groups which can be bound via netlink_bind().CC: Christophe Leroy
CC: Richard Guy Briggs
Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.")
Reported-by: Erhard F.
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
commit 0daa63ed4c6c4302790ce67b7a90c0997ceb7514 upstream.
The below-mentioned commit changed the code to unlock *inside*
the function, but previously the unlock was *outside*. It failed
to remove the outer unlock, however, leading to double unlock.Fix this.
Fixes: 33483a6b88e4 ("mac80211: fix missing unlock on error in ieee80211_mark_sta_auth()")
Signed-off-by: Andrei Otcheretianski
Link: https://lore.kernel.org/r/20200221104719.cce4741cf6eb.I671567b185c8a4c2409377e483fd149ce590f56d@changeid
[rewrite commit message to better explain what happened]
Signed-off-by: Johannes Berg
Signed-off-by: Greg Kroah-Hartman -
commit 9951ebfcdf2b97dbb28a5d930458424341e61aa2 upstream.
If nl80211_parse_he_obss_pd() fails, we leak the previously
allocated ACL memory. Free it in this case.Fixes: 796e90f42b7e ("cfg80211: add support for parsing OBBS_PD attributes")
Signed-off-by: Johannes Berg
Link: https://lore.kernel.org/r/20200221104142.835aba4cdd14.I1923b55ba9989c57e13978f91f40bfdc45e60cbd@changeid
Signed-off-by: Johannes Berg
Signed-off-by: Greg Kroah-Hartman -
commit c4a3922d2d20c710f827d3a115ee338e8d0467df upstream.
It is unnecessary to hold hashlimit_mutex for htable_destroy()
as it is already removed from the global hashtable and its
refcount is already zero.Also, switch hinfo->use to refcount_t so that we don't have
to hold the mutex until it reaches zero in htable_put().Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com
Acked-by: Florian Westphal
Signed-off-by: Cong Wang
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman -
commit 8af1c6fbd9239877998c7f5a591cb2c88d41fb66 upstream.
When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik
Signed-off-by: Greg Kroah-Hartman -
commit 67f562e3e147750a02b2a91d21a163fc44a1d13e upstream.
SMC does not work together with FASTOPEN. If sendmsg() is called with
flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to
fallback mode. To handle the previous ioctl FIOASYNC call correctly
in this case, it is necessary to transfer the socket wait queue
fasync_list to the internal TCP socket.Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com
Fixes: ee9dfbef02d18 ("net/smc: handle sockopts forcing fallback")
Signed-off-by: Ursula Braun
Signed-off-by: Karsten Graul
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 33181ea7f5a62a17fbe55f0f73428ecb5e686be8 ]
Before this patch, STA's would set new width of 160/80+80 MHz based on AP capability only.
This is wrong because STA may not support > 80MHz BW.
Fix is to verify STA has 160/80+80 MHz capability before increasing its width to > 80MHz.The "support_80_80" and "support_160" setting is based on:
"Table 9-272 — Setting of the Supported Channel Width Set subfield and Extended NSS BW
Support subfield at a STA transmitting the VHT Capabilities Information field"
From "Draft P802.11REVmd_D3.0.pdf"Signed-off-by: Aviad Brikman
Signed-off-by: Shay Bar
Link: https://lore.kernel.org/r/20200210130728.23674-1-shay.bar@celeno.com
Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin -
[ Upstream commit ea75080110a4c1fa011b0a73cb8f42227143ee3e ]
The nl80211_policy is missing for NL80211_ATTR_STATUS_CODE attribute.
As a result, for strictly validated commands, it's assumed to not be
supported.Signed-off-by: Sergey Matyukevich
Link: https://lore.kernel.org/r/20200213131608.10541-2-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin -
[ Upstream commit bfb7bac3a8f47100ebe7961bd14e924c96e21ca7 ]
When preparing ethtool drvinfo, check if wiphy driver is defined
before dereferencing it. Driver may not exist, e.g. if wiphy is
attached to a virtual platform device.Signed-off-by: Sergey Matyukevich
Link: https://lore.kernel.org/r/20200203105644.28875-1-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin -
[ Upstream commit a04564c99bb4a92f805a58e56b2d22cc4978f152 ]
We only use the parsing CRC for checking if a beacon changed,
and elements with an ID > 63 cannot be represented in the
filter. Thus, like we did before with WMM and Cisco vendor
elements, just statically add these forgotten items to the
CRC:
- WLAN_EID_VHT_OPERATION
- WLAN_EID_OPMODE_NOTIFI guess that in most cases when VHT/HE operation change, the HT
operation also changed, and so the change was picked up, but we
did notice that pure operating mode notification changes were
ignored.Signed-off-by: Johannes Berg
Signed-off-by: Luca Coelho
Link: https://lore.kernel.org/r/20200131111300.891737-22-luca@coelho.fi
[restrict to VHT for the mac80211 branch]
Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin -
[ Upstream commit afecdb376bd81d7e16578f0cfe82a1aec7ae18f3 ]
When splitting an RTA_MULTIPATH request into multiple routes and adding the
second and later components, we must not simply remove NLM_F_REPLACE but
instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink
message was malformed.For example,
ip route add 2001:db8::1/128 dev dummy0
ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \
nexthop via fe80::30:2 dev dummy0
results in the following warnings:
[ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE
[ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new routeThis patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to
what it would get if the multipath route had been added in multiple netlink
operations:
ip route add 2001:db8::1/128 dev dummy0
ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0
ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Benjamin Poirier
Reviewed-by: Michal Kubecek
Reviewed-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e404b8c7cfb31654c9024d497cec58a501501692 ]
After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no
longer possible to replace an ECMP-able route by a non ECMP-able route.
For example,
ip route add 2001:db8::1/128 via fe80::1 dev dummy0
ip route replace 2001:db8::1/128 dev dummy0
does not work as expected.Tweak the replacement logic so that point 3 in the log of the above commit
becomes:
3. If the new route is not ECMP-able, and no matching non-ECMP-able route
exists, replace matching ECMP-able route (if any) or add the new route.We can now summarize the entire replace semantics to:
When doing a replace, prefer replacing a matching route of the same
"ECMP-able-ness" as the replace argument. If there is no such candidate,
fallback to the first route found.Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Benjamin Poirier
Reviewed-by: Michal Kubecek
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 7151affeef8d527f50b4b68a871fd28bd660023f ]
netdev_next_lower_dev_rcu() will be used to implement a function,
which is to walk all lower interfaces.
There are already functions that they walk their lower interface.
(netdev_walk_all_lower_dev_rcu, netdev_walk_all_lower_dev()).
But, there would be cases that couldn't be covered by given
netdev_walk_all_lower_dev_{rcu}() function.
So, some modules would want to implement own function,
which is to walk all lower interfaces.In the next patch, netdev_next_lower_dev_rcu() will be used.
In addition, this patch removes two unused prototypes in netdevice.h.Signed-off-by: Taehee Yoo
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 245709ec8be89af46ea7ef0444c9c80913999d99 ]
When T2 timer is to be stopped, the asoc should also be deleted,
otherwise, there will be no chance to call sctp_association_free
and the asoc could last in memory forever.However, in sctp_sf_shutdown_sent_abort(), after adding the cmd
SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the
format error from __sctp_sf_do_9_1_abort() and miss adding
SCTP_CMD_ASSOC_FAILED where the asoc will be deleted.This patch is to fix it by moving the format error check out of
__sctp_sf_do_9_1_abort(), and do it before adding the cmd
SCTP_CMD_TIMER_STOP for T2 timer.Thanks Hangbin for reporting this issue by the fuzz testing.
v1->v2:
- improve the comment in the code as Marcelo's suggestion.Fixes: 96ca468b86b0 ("sctp: check invalid value of length parameter in error cause")
Reported-by: Hangbin Liu
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: Xin Long
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 303d0403b8c25e994e4a6e45389e173cf8706fb5 ]
As of the below commit, udp sockets bound to a specific address can
coexist with one bound to the any addr for the same port.The commit also phased out the use of socket hashing based only on
port (hslot), in favor of always hashing on {addr, port} (hslot2).The change broke the following behavior with disconnect (AF_UNSPEC):
server binds to 0.0.0.0:1337
server connects to 127.0.0.1:80
server disconnects
client connects to 127.0.0.1:1337
client sends "hello"
server reads "hello" // times out, packet did not find skOn connect the server acquires a specific source addr suitable for
routing to its destination. On disconnect it reverts to the any addr.The connect call triggers a rehash to a different hslot2. On
disconnect, add the same to return to the original hslot2.Skip this step if the socket is going to be unhashed completely.
Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address")
Reported-by: Pavel Roskin
Signed-off-by: Willem de Bruijn
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ]
This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe
Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6
("net: dev: introduce support for sch BYPASS for lockless qdisc").
I can't find any working solution other than a plain revert.This will introduce some minor performance regressions for
pfifo_fast qdisc. I plan to address them in net-next with more
indirect call wrapper boilerplate for qdiscs.Reported-by: Ahmad Fatoum
Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 06f5201c6392f998a49ca9c9173e2930c8eb51d8 ]
Current code doesn't check if tcp sequence number is starting from (/after)
1st record's start sequnce number. It only checks if seq number is before
1st record's end sequnce number. This problem will always be a possibility
in re-transmit case. If a record which belongs to a requested seq number is
already deleted, tls_get_record will start looking into list and as per the
check it will look if seq number is before the end seq of 1st record, which
will always be true and will return 1st record always, it should in fact
return NULL.
As part of the fix, start looking each record only if the sequence number
lies in the list else return NULL.
There is one more check added, driver look for the start marker record to
handle tcp packets which are before the tls offload start sequence number,
hence return 1st record if the record is tls start marker and seq number is
before the 1st record's starting sequence number.Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Rohit Maheshwari
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ]
tc flower rules that are based on src or dst port blocking are sometimes
ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
ports from the skb for tc flower to match against. However, the port
dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
key_control->flags. All callers of __skb_flow_dissect(), zero-out the
key_control field except for fl_classify() as used by the flower
classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
__skb_flow_dissect(), since key_control is allocated on the stack
and may not be initialized.Since key_basic and key_control are present for all flow keys, let's
make sure they are initialized.Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
Co-developed-by: Eric Dumazet
Signed-off-by: Eric Dumazet
Acked-by: Cong Wang
Signed-off-by: Jason Baron
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 540e585a79e9d643ede077b73bcc7aa2d7b4d919 ]
In 709772e6e06564ed94ba740de70185ac3d792773, RT_TABLE_COMPAT was added to
allow legacy software to deal with routing table numbers >= 256, but the
same change to FIB rule queries was overlooked.Signed-off-by: Jethro Beekman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
29 Feb, 2020
2 commits
-
commit 963485d436ccc2810177a7b08af22336ec2af67b upstream.
rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
local->client_conns_lock. RCU callbacks, however, are normally called from
softirq context, which can cause lockdep to notice the locking
inconsistency.To get lockdep to detect this, it's necessary to have the connection
cleaned up on the put at the end of the last of its calls, though normally
the clean up is deferred. This can be induced, however, by starting a call
on an AF_RXRPC socket and then closing the socket without reading the
reply.Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
workqueue if in softirq-mode and defer the destruction to process context.Note that another way to fix this could be to add a bunch of bh-disable
annotations to the spinlocks concerned - and there might be more than just
those two - but that means spending more time with BHs disabled.Note also that some of these places were covered by bh-disable spinlocks
belonging to the rxrpc_transport object, but these got removed without the
_bh annotation being retained on the next lock in.Fixes: 999b69f89241 ("rxrpc: Kill the client connection bundle concept")
Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
Signed-off-by: David Howells
cc: Hillf Danton
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
commit 8d0015a7ab76b8b1e89a3e5f5710a6e5103f2dd5 upstream.
The user-specified hashtable size is unbound, this could
easily lead to an OOM or a hung task as we hold the global
mutex while allocating and initializing the new hashtable.Add a max value to cap both cfg->size and cfg->max, as
suggested by Florian.Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com
Signed-off-by: Cong Wang
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman
26 Feb, 2020
1 commit
-
The DSA drivers that implement .phylink_mac_link_state should normally
register an interrupt for the PCS, from which they should call
phylink_mac_change(). However not all switches implement this, and those
who don't should set this flag in dsa_switch in the .setup callback, so
that PHYLINK will poll for a few ms until the in-band AN link timer
expires and the PCS state settles.Signed-off-by: Vladimir Oltean
Conflicts:
include/net/dsa.htrivially with upstream commit 05f294a85235 ("net: dsa: allocate ports
on touch") which was merged in v5.4-rc3.(cherry picked from commit 222d888331f409755fc25b1933e5dee1a976b9c1)
24 Feb, 2020
9 commits
-
[ Upstream commit 1d82163714c16ebe09c7a8c9cd3cef7abcc16208 ]
When we unhash the cache entry, we need to handle any pending upcalls
by calling cache_fresh_unlocked().Signed-off-by: Trond Myklebust
Signed-off-by: J. Bruce Fields
Signed-off-by: Sasha Levin -
[ Upstream commit 0a29275b6300f39f78a87f2038bbfe5bdbaeca47 ]
A negative value should be returned if map->map_type is invalid
although that is impossible now, but if we run into such situation
in future, then xdpbuff could be leaked.Daniel Borkmann suggested:
-EBADRQC should be returned to stay consistent with generic XDP
for the tracepoint output and not to be confused with -EOPNOTSUPP
from other locations like dev_map_enqueue() when ndo_xdp_xmit is
missing and such.Suggested-by: Daniel Borkmann
Signed-off-by: Li RongQing
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Sasha Levin -
[ Upstream commit 0705f95c332081036d85f26691e9d3cd7d901c31 ]
ERSPAN_VERSION is an attribute parsed in kernel side, nla_policy
type should be added for it, like other attributes.Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Xin Long
Reviewed-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Sasha Levin -
[ Upstream commit 0b2dc83906cf1e694e48003eae5df8fa63f76fd9 ]
We need to have a synchronize_rcu before free'ing the sockhash because any
outstanding psock references will have a pointer to the map and when they
use it, this could trigger a use after free.This is a sister fix for sockhash, following commit 2bb90e5cc90e ("bpf:
sockmap, synchronize_rcu before free'ing map") which addressed sockmap,
which comes from a manual audit.Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Jakub Sitnicki
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200206111652.694507-3-jakub@cloudflare.com
Signed-off-by: Sasha Levin -
[ Upstream commit e2debf0852c4d66ba1a8bde12869b196094c70a7 ]
unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
to fl_policy.Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Signed-off-by: Davide Caratti
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1afa3cc90f8fb745c777884d79eaa1001d6927a6 ]
unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size
of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper
entry to mall_policy.Fixes: b87f7936a932 ("net/sched: Add match-all classifier hw offloading.")
Signed-off-by: Davide Caratti
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 04fb91243a853dbde216d829c79d9632e52aa8d9 ]
Passing tag size to skb_cow_head will make sure
there is enough headroom for the tag data.
This change does not introduce any overhead in case there
is already available headroom for tag.Signed-off-by: Per Forlin
Reviewed-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 457fed775c97ac2c0cd1672aaf2ff2c8a6235e87 ]
As nlmsg_put() does not clear the memory that is reserved,
it this the caller responsability to make sure all of this
memory will be written, in order to not reveal prior content.While we are at it, we can provide the socket cookie even
if clsock is not set.syzbot reported :
BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline]
BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x220 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
__arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
__fswab32 include/uapi/linux/swab.h:59 [inline]
__swab32p include/uapi/linux/swab.h:179 [inline]
__be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
kmsan_kmalloc_large+0x73/0xc0 mm/kmsan/kmsan_hooks.c:128
kmalloc_large_node_hook mm/slub.c:1406 [inline]
kmalloc_large_node+0x282/0x2c0 mm/slub.c:3841
__kmalloc_node_track_caller+0x44b/0x1200 mm/slub.c:4368
__kmalloc_reserve net/core/skbuff.c:141 [inline]
__alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
alloc_skb include/linux/skbuff.h:1049 [inline]
netlink_dump+0x44b/0x1ab0 net/netlink/af_netlink.c:2224
__netlink_dump_start+0xbb2/0xcf0 net/netlink/af_netlink.c:2352
netlink_dump_start include/linux/netlink.h:233 [inline]
smc_diag_handler_dump+0x2ba/0x300 net/smc/smc_diag.c:242
sock_diag_rcv_msg+0x211/0x610 net/core/sock_diag.c:256
netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg net/socket.c:659 [inline]
kernel_sendmsg+0x433/0x440 net/socket.c:679
sock_no_sendpage+0x235/0x300 net/core/sock.c:2740
kernel_sendpage net/socket.c:3776 [inline]
sock_sendpage+0x1e1/0x2c0 net/socket.c:937
pipe_to_sendpage+0x38c/0x4c0 fs/splice.c:458
splice_from_pipe_feed fs/splice.c:512 [inline]
__splice_from_pipe+0x539/0xed0 fs/splice.c:636
splice_from_pipe fs/splice.c:671 [inline]
generic_splice_sendpage+0x1d5/0x2d0 fs/splice.c:844
do_splice_from fs/splice.c:863 [inline]
do_splice fs/splice.c:1170 [inline]
__do_sys_splice fs/splice.c:1447 [inline]
__se_sys_splice+0x2380/0x3350 fs/splice.c:1427
__x64_sys_splice+0x6e/0x90 fs/splice.c:1427
do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x44/0xa9Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets")
Signed-off-by: Eric Dumazet
Cc: Ursula Braun
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit ad1e03b2b3d4430baaa109b77bc308dc73050de3 ]
The current generic XDP handler skips execution of XDP programs entirely if
an SKB is marked as cloned. This leads to some surprising behaviour, as
packets can end up being cloned in various ways, which will make an XDP
program not see all the traffic on an interface.This was discovered by a simple test case where an XDP program that always
returns XDP_DROP is installed on a veth device. When combining this with
the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending
side, SKBs reliably end up in the cloned state, causing them to be passed
through to the receiving interface instead of being dropped. A minimal
reproducer script for this is included below.This patch fixed the issue by simply triggering the existing linearisation
code for cloned SKBs instead of skipping the XDP program execution. This
behaviour is in line with the behaviour of the native XDP implementation
for the veth driver, which will reallocate and copy the SKB data if the SKB
is marked as shared.Reproducer Python script (requires BCC and Scapy):
from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP
from bcc import BPF
import time, sys, subprocess, shlexSKB_MODE = (1 << 1)
DRV_MODE = (1 << 2)
PYTHON=sys.executabledef client():
time.sleep(2)
# Sniffing on the sender causes skb_cloned() to be set
s = AsyncSniffer()
s.start()for p in range(10):
sendp(Ether(dst="aa:aa:aa:aa:aa:aa", src="cc:cc:cc:cc:cc:cc")/IP()/UDP()/Raw("Test"),
verbose=False)
time.sleep(0.1)s.stop()
return 0def server(mode):
prog = BPF(text="int dummy_drop(struct xdp_md *ctx) {return XDP_DROP;}")
func = prog.load_func("dummy_drop", BPF.XDP)
prog.attach_xdp("a_to_b", func, mode)time.sleep(1)
s = sniff(iface="a_to_b", count=10, timeout=15)
if len(s):
print(f"Got {len(s)} packets - should have gotten 0")
return 1
else:
print("Got no packets - as expected")
return 0if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} ")
sys.exit(1)if sys.argv[1] == "client":
sys.exit(client())
elif sys.argv[1] == "server":
mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE
sys.exit(server(mode))
else:
try:
mode = sys.argv[1]
if mode not in ('skb', 'drv'):
print(f"Usage: {sys.argv[0]} ")
sys.exit(1)
print(f"Running in {mode} mode")for cmd in [
'ip netns add netns_a',
'ip netns add netns_b',
'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b',
# Disable ipv6 to make sure there's no address autoconf traffic
'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1',
'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1',
'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa',
'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc',
'ip -n netns_a link set dev a_to_b up',
'ip -n netns_b link set dev b_to_a up']:
subprocess.check_call(shlex.split(cmd))server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}"))
client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client"))client.wait()
server.wait()
sys.exit(server.returncode)finally:
subprocess.run(shlex.split("ip netns delete netns_a"))
subprocess.run(shlex.split("ip netns delete netns_b"))Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Reported-by: Stepan Horacek
Suggested-by: Paolo Abeni
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
20 Feb, 2020
2 commits
-
commit 2bf973ff9b9aeceb8acda629ae65341820d4b35b upstream.
Previously I intended to ignore quiet mode in probe response, however
I ended up ignoring it instead for action frames. As a matter of fact,
this path isn't invoked for probe responses to start with. Just revert
this patch.Signed-off-by: Sara Sharon
Fixes: 7976b1e9e3bf ("mac80211: ignore quiet mode in probe")
Signed-off-by: Luca Coelho
Link: https://lore.kernel.org/r/20200131111300.891737-15-luca@coelho.fi
Signed-off-by: Johannes Berg
Signed-off-by: Greg Kroah-Hartman -
commit ca1c671302825182629d3c1a60363cee6f5455bb upstream.
The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.The bug was exposed by recent changes to the AMD IOMMU driver, which
enabled sg entry concatenation.Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
new memory registration API") and reviewing other kernel ULPs, it's
not clear that the frwr_map() logic was ever correct for this case.Reported-by: Andre Tomt
Suggested-by: Robin Murphy
Signed-off-by: Chuck Lever
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman
15 Feb, 2020
3 commits
-
commit 85b8ac01a421791d66c3a458a7f83cfd173fe3fa upstream.
It's currently possible to insert sockets in unexpected states into
a sockmap, due to a TOCTTOU when updating the map from a syscall.
sock_map_update_elem checks that sk->sk_state == TCP_ESTABLISHED,
locks the socket and then calls sock_map_update_common. At this
point, the socket may have transitioned into another state, and
the earlier assumptions don't hold anymore. Crucially, it's
conceivable (though very unlikely) that a socket has become unhashed.
This breaks the sockmap's assumption that it will get a callback
via sk->sk_prot->unhash.Fix this by checking the (fixed) sk_type and sk_protocol without the
lock, followed by a locked check of sk_state.Unfortunately it's not possible to push the check down into
sock_(map|hash)_update_common, since BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
run before the socket has transitioned from TCP_SYN_RECV into
TCP_ESTABLISHED.Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer
Signed-off-by: Daniel Borkmann
Reviewed-by: Jakub Sitnicki
Link: https://lore.kernel.org/bpf/20200207103713.28175-1-lmb@cloudflare.com
Signed-off-by: Greg Kroah-Hartman -
commit 88d6f130e5632bbf419a2e184ec7adcbe241260b upstream.
It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
exponential effects on the number of states in the sparse checker.This patch breaks them up by calculating the "nbuckets" first so that the
"bucket_log" only needs to take ilog2().In addition, Linus mentioned:
Patch looks good, but I'd like to point out that it's not just sparse.
You can see it with a simple
make net/core/bpf_sk_storage.i
grep 'smap->bucket_log = ' net/core/bpf_sk_storage.i | wcand see the end result:
1 365071 2686974
That's one line (the assignment line) that is 2,686,974 characters in
length.Now, sparse does happen to react particularly badly to that (I didn't
look to why, but I suspect it's just that evaluating all the types
that don't actually ever end up getting used ends up being much more
expensive than it should be), but I bet it's not good for gcc either.Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reported-by: Randy Dunlap
Reported-by: Luc Van Oostenryck
Suggested-by: Linus Torvalds
Signed-off-by: Martin KaFai Lau
Signed-off-by: Daniel Borkmann
Reviewed-by: Luc Van Oostenryck
Link: https://lore.kernel.org/bpf/20200207081810.3918919-1-kafai@fb.com
Signed-off-by: Greg Kroah-Hartman -
commit 0b2dc83906cf1e694e48003eae5df8fa63f76fd9 upstream.
We need to have a synchronize_rcu before free'ing the sockhash because any
outstanding psock references will have a pointer to the map and when they
use it, this could trigger a use after free.This is a sister fix for sockhash, following commit 2bb90e5cc90e ("bpf:
sockmap, synchronize_rcu before free'ing map") which addressed sockmap,
which comes from a manual audit.Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Jakub Sitnicki
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200206111652.694507-3-jakub@cloudflare.com
Signed-off-by: Greg Kroah-Hartman