Eric Lee / smarc-fsl-linux-kernel

27 Feb, 2020

7 commits

2eb51c75d net: core: devlink.c: Use built-in RCU list checking ... Browse Code »

list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.

The devlink->lock is held when devlink_dpipe_table_find()
is called in non RCU read side section. Therefore, pass struct devlink
to devlink_dpipe_table_find() for lockdep checking.

Signed-off-by: Madhuparna Bhowmik
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Madhuparna Bhowmik
2020-02-27 08:59:18 +0800
98c5f7d44 net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec ... Browse Code »

We are still experiencing some packet loss with the existing advanced
congestion buffering (ACB) settings with the IMP port configured for
2Gb/sec, so revert to conservative link speeds that do not produce
packet loss until this is resolved.

Fixes: 8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
Fixes: de34d7084edd ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port")
Signed-off-by: Florian Fainelli
Reviewed-by: Vivien Didelot
Signed-off-by: David S. Miller

Florian Fainelli
2020-02-27 08:38:23 +0800
574b238f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes:

1) Perform garbage collection from workqueue to fix rcu detected
stall in ipset hash set types, from Jozsef Kadlecsik.

2) Fix the forceadd evaluation path, also from Jozsef.

3) Fix nft_set_pipapo selftest, from Stefano Brivio.

4) Crash when add-flush-add element in pipapo set, also from Stefano.
Add test to cover this crash.

5) Remove sysctl entry under mutex in hashlimit, from Cong Wang.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-02-27 08:30:17 +0800
9a005c389 bnxt_en: add newline to netdev_*() format strings ... Browse Code »

Add missing newlines to netdev_* format strings so the lines
aren't buffered by the printk subsystem.

Nitpicked-by: Jakub Kicinski
Signed-off-by: Jonathan Lemon
Acked-by: Michael Chan
Signed-off-by: David S. Miller

Jonathan Lemon
2020-02-27 07:52:33 +0800
99b79c390 netfilter: xt_hashlimit: unregister proc file before releasing mutex ... Browse Code »

Before releasing the global mutex, we only unlink the hashtable
from the hash list, its proc file is still not unregistered at
this point. So syzbot could trigger a race condition where a
parallel htable_create() could register the same file immediately
after the mutex is released.

Move htable_remove_proc_entry() back to mutex protection to
fix this. And, fold htable_destroy() into htable_put() to make
the code slightly easier to understand.

Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
Fixes: c4a3922d2d20 ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
Signed-off-by: Cong Wang
Signed-off-by: Pablo Neira Ayuso

Cong Wang
2020-02-27 06:25:07 +0800
e34f1753e ethtool: limit bitset size ... Browse Code »

Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
attributes and even the message by passing a value between (u32)(-31)
and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.

The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
attribute "payload".

Prevent this overflow byt limiting the bitset size. Technically, compact
bitset format would allow bitset sizes up to almost 2^18 (so that the
nest size does not exceed U16_MAX) but bitsets used by ethtool are much
shorter. S16_MAX, the largest value which can be directly used as an
upper limit in policy, should be a reasonable compromise.

Fixes: 10b518d4e6dd ("ethtool: netlink bitset handling")
Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com
Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com
Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek
Signed-off-by: David S. Miller

Michal Kubecek
2020-02-27 03:27:31 +0800
6e11d1578 net: Fix Tx hash bound checking ... Browse Code »

Fixes the lower and upper bounds when there are multiple TCs and
traffic is on the the same TC on the same device.

The lower bound is represented by 'qoffset' and the upper limit for
hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
mapping when there are multiple TCs, as the queue indices for upper TCs
will be offset by 'qoffset'.

v2: Fixed commit description based on comments.

Fixes: 1b837d489e06 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
Fixes: eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Signed-off-by: Amritha Nambiar
Reviewed-by: Alexander Duyck
Reviewed-by: Sridhar Samudrala
Signed-off-by: David S. Miller

Amritha Nambiar
2020-02-27 03:14:10 +0800

26 Feb, 2020

4 commits

0954df70f selftests: nft_concat_range: Add test for reported add/flush/add issue ... Browse Code »

Add a specific test for the crash reported by Phil Sutter and addressed
in the previous patch. The test cases that, in my intention, should
have covered these cases, that is, the ones from the 'concurrency'
section, don't run these sequences tightly enough and spectacularly
failed to catch this.

While at it, define a convenient way to add these kind of tests, by
adding a "reported issues" test section.

It's more convenient, for this particular test, to execute the set
setup in its own function. However, future test cases like this one
might need to call setup functions, and will typically need no tools
other than nft, so allow for this in check_tools().

The original form of the reproducer used here was provided by Phil.

Reported-by: Phil Sutter
Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Stefano Brivio
2020-02-26 21:33:09 +0800
212d58c10 nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() ... Browse Code »

Phil reports that adding elements, flushing and re-adding them
right away:

nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
nft flush set t s
nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'

triggers, almost reliably, a crash like this one:

[ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
[ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192
[ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
[ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
[ 71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
[ 71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
[ 71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
[ 71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
[ 71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
[ 71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
[ 71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
[ 71.334596] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[ 71.335780] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
[ 71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 71.339718] Call Trace:
[ 71.340093] nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
[ 71.340973] nft_set_destroy+0x20/0x50 [nf_tables]
[ 71.341879] nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
[ 71.342916] process_one_work+0x1d5/0x3c0
[ 71.343601] worker_thread+0x4a/0x3c0
[ 71.344229] kthread+0xfb/0x130
[ 71.344780] ? process_one_work+0x3c0/0x3c0
[ 71.345477] ? kthread_park+0x90/0x90
[ 71.346129] ret_from_fork+0x35/0x40
[ 71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
[ 71.348153] ---[ end trace 2eaa8149ca759bcc ]---
[ 71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
[ 71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
[ 71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
[ 71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
[ 71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
[ 71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
[ 71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
[ 71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
[ 71.350025] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[ 71.350026] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
[ 71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 71.350030] Kernel panic - not syncing: Fatal exception
[ 71.350412] Kernel Offset: disabled
[ 71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---

which is caused by dangling elements that have been deactivated, but
never removed.

On a flush operation, nft_pipapo_walk() walks through all the elements
in the mapping table, which are then deactivated by nft_flush_set(),
one by one, and added to the commit list for removal. Element data is
then freed.

On transaction commit, nft_pipapo_remove() is called, and failed to
remove these elements, leading to the stale references in the mapping.
The first symptom of this, revealed by KASan, is a one-byte
use-after-free in subsequent calls to nft_pipapo_walk(), which is
usually not enough to trigger a panic. When stale elements are used
more heavily, though, such as double-free via nft_pipapo_destroy()
as in Phil's case, the problem becomes more noticeable.

The issue comes from that fact that, on a flush operation,
nft_pipapo_remove() won't get the actual key data via elem->key,
elements to be deleted upon commit won't be found by the lookup via
pipapo_get(), and removal will be skipped. Key data should be fetched
via nft_set_ext_key(), instead.

Reported-by: Phil Sutter
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Stefano Brivio
2020-02-26 21:33:09 +0800
9ea4894ba Merge branch 'master' of git://blackhole.kfki.hu/nf ... Browse Code »

Jozsef Kadlecsik says:

====================
ipset patches for nf

The first one is larger than usual, but the issue could not be solved simpler.
Also, it's a resend of the patch I submitted a few days ago, with a one line
fix on top of that: the size of the comment extensions was not taken into
account at reporting the full size of the set.

- Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot
by introducing region locking and using workqueue instead of timer based
gc of timed out entries in hash types of sets in ipset.
- Fix the forceadd evaluation path - the bug was also uncovered by the syzbot.
====================

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-02-26 20:55:15 +0800
a8e41f603 icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=n ... Browse Code »

The icmpv6_send function has long had a static inline implementation
with an empty body for CONFIG_IPV6=n, so that code calling it doesn't
need to be ifdef'd. The new icmpv6_ndo_send function, which is intended
for drivers as a drop-in replacement with an identical function
signature, should follow the same pattern. Without this patch, drivers
that used to work with CONFIG_IPV6=n now result in a linker error.

Cc: Chen Zhou
Reported-by: Hulk Robot
Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Jason A. Donenfeld
Signed-off-by: David S. Miller

Jason A. Donenfeld
2020-02-26 03:01:39 +0800

25 Feb, 2020

8 commits

d08205565 selftests: nft_concat_range: Move option for 'list ruleset' before command ... Browse Code »

Before nftables commit fb9cea50e8b3 ("main: enforce options before
commands"), 'nft list ruleset -a' happened to work, but it's wrong
and won't work anymore. Replace it by 'nft -a list ruleset'.

Reported-by: Chen Yi
Fixes: 611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation")
Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Stefano Brivio
2020-02-25 20:01:07 +0800
3614d05b5 Merge tag 'mac80211-for-net-2020-02-24' of git://git.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/jberg/mac80211

Johannes Berg

====================
A few fixes:
* remove a double mutex-unlock
* fix a leak in an error path
* NULL pointer check
* include if_vlan.h where needed
* avoid RCU list traversal when not under RCU
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2020-02-25 07:43:38 +0800
823d81b0f net: bridge: fix stale eth hdr pointer in br_dev_xmit ... Browse Code »

In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
if the packet has the vlan header inside (e.g. bridge with disabled
tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
to extract the vid before filtering which in turn calls pskb_may_pull()
and we may end up with a stale eth pointer. Moreover the cached eth header
pointer will generally be wrong after that operation. Remove the eth header
caching and just use eth_hdr() directly, the compiler does the right thing
and calculates it only once so we don't lose anything.

Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2020-02-25 03:11:19 +0800
e4686c2d0 Merge branch 'net-ll_temac-Bugfixes' ... Browse Code »

Esben Haabendal says:

====================
net: ll_temac: Bugfixes

Fix a number of bugs which have been present since the first commit.

The bugs fixed in patch 1,2 and 4 have all been observed in real systems, and
was relatively easy to reproduce given an appropriate stress setup.

Changes since v1:

- Changed error handling of of dma_map_single() in temac_start_xmit() to drop
packet instead of returning NETDEV_TX_BUSY.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-02-25 02:58:57 +0800
1d63b8d66 net: ll_temac: Handle DMA halt condition caused by buffer underrun ... Browse Code »

The SDMA engine used by TEMAC halts operation when it has finished
processing of the last buffer descriptor in the buffer ring.
Unfortunately, no interrupt event is generated when this happens,
so we need to setup another mechanism to make sure DMA operation is
restarted when enough buffers have been added to the ring.

Fixes: 92744989533c ("net: add Xilinx ll_temac device driver")
Signed-off-by: Esben Haabendal
Signed-off-by: David S. Miller

Esben Haabendal
2020-02-25 02:58:48 +0800
770d9c679 net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressure ... Browse Code »

Failures caused by GFP_ATOMIC memory pressure have been observed, and
due to the missing error handling, results in kernel crash such as

[1876998.350133] kernel BUG at mm/slub.c:3952!
[1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1
[1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017
[1876998.350160] RIP: 0010:kfree+0x1ca/0x220
[1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24
5b
[1876998.350172] RSP: 0018:ffffc900000f0df0 EFLAGS: 00010246
[1876998.350177] RAX: ffffea00027f0708 RBX: ffff888008d78000 RCX: 0000000000391372
[1876998.350181] RDX: 0000000000000000 RSI: ffffe8ffffd01400 RDI: ffff888008d78000
[1876998.350185] RBP: ffff8881185a5d00 R08: ffffc90000087dd8 R09: 000000000000280a
[1876998.350189] R10: 0000000000000002 R11: 0000000000000000 R12: ffffea0000235e00
[1876998.350193] R13: ffff8881185438a0 R14: 0000000000000000 R15: ffff888118543870
[1876998.350198] FS: 0000000000000000(0000) GS:ffff88811f300000(0000) knlGS:0000000000000000
[1876998.350203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
s#1 Part1
[1876998.350206] CR2: 00007f8dac7b09f0 CR3: 000000011e20a006 CR4: 00000000001606e0
[1876998.350210] Call Trace:
[1876998.350215]
[1876998.350224] ? __netif_receive_skb_core+0x70a/0x920
[1876998.350229] kfree_skb+0x32/0xb0
[1876998.350234] __netif_receive_skb_core+0x70a/0x920
[1876998.350240] __netif_receive_skb_one_core+0x36/0x80
[1876998.350245] process_backlog+0x8b/0x150
[1876998.350250] net_rx_action+0xf7/0x340
[1876998.350255] __do_softirq+0x10f/0x353
[1876998.350262] irq_exit+0xb2/0xc0
[1876998.350265] do_IRQ+0x77/0xd0
[1876998.350271] common_interrupt+0xf/0xf
[1876998.350274]

In order to handle such failures more graceful, this change splits the
receive loop into one for consuming the received buffers, and one for
allocating new buffers.

When GFP_ATOMIC allocations fail, the receive will continue with the
buffers that is still there, and with the expectation that the allocations
will succeed in a later call to receive.

Fixes: 92744989533c ("net: add Xilinx ll_temac device driver")
Signed-off-by: Esben Haabendal
Signed-off-by: David S. Miller

Esben Haabendal
2020-02-25 02:58:48 +0800
d07c849cd net: ll_temac: Add more error handling of dma_map_single() calls ... Browse Code »

This adds error handling to the remaining dma_map_single() calls, so that
behavior is well defined if/when we run out of DMA memory.

Fixes: 92744989533c ("net: add Xilinx ll_temac device driver")
Signed-off-by: Esben Haabendal
Signed-off-by: David S. Miller

Esben Haabendal
2020-02-25 02:58:48 +0800
84823ff80 net: ll_temac: Fix race condition causing TX hang ... Browse Code »

It is possible that the interrupt handler fires and frees up space in
the TX ring in between checking for sufficient TX ring space and
stopping the TX queue in temac_start_xmit. If this happens, the
queue wake from the interrupt handler will occur before the queue is
stopped, causing a lost wakeup and the adapter's transmit hanging.

To avoid this, after stopping the queue, check again whether there is
sufficient space in the TX ring. If so, wake up the queue again.

This is a port of the similar fix in axienet driver,
commit 7de44285c1f6 ("net: axienet: Fix race condition causing TX hang").

Fixes: 23ecc4bde21f ("net: ll_temac: fix checksum offload logic")
Signed-off-by: Esben Haabendal
Signed-off-by: David S. Miller

Esben Haabendal
2020-02-25 02:58:48 +0800

24 Feb, 2020

9 commits

253216ffb mac80211: rx: avoid RCU list traversal under mutex ... Browse Code »

local->sta_mtx is held in __ieee80211_check_fast_rx_iface().
No need to use list_for_each_entry_rcu() as it also requires
a cond argument to avoid false lockdep warnings when not used in
RCU read-side section (with CONFIG_PROVE_RCU_LIST).
Therefore use list_for_each_entry();

Signed-off-by: Madhuparna Bhowmik
Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com
Signed-off-by: Johannes Berg

Madhuparna Bhowmik
2020-02-24 17:42:38 +0800
e3ae39edb nl80211: explicitly include if_vlan.h ... Browse Code »

We use that here, and do seem to get it through some recursive
include, but better include it explicitly.

Signed-off-by: Johannes Berg
Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeid
Signed-off-by: Johannes Berg

Johannes Berg
2020-02-24 17:41:13 +0800
6132c1d90 net: core: devlink.c: Hold devlink->lock from the beginning of devlink_dpipe_table_register() ... Browse Code »

devlink_dpipe_table_find() should be called under either
rcu_read_lock() or devlink->lock. devlink_dpipe_table_register()
calls devlink_dpipe_table_find() without holding the lock
and acquires it later. Therefore hold the devlink->lock
from the beginning of devlink_dpipe_table_register().

Suggested-by: Jiri Pirko
Signed-off-by: Madhuparna Bhowmik
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Madhuparna Bhowmik
2020-02-24 13:17:37 +0800
503ba7c69 net: phy: Avoid multiple suspends ... Browse Code »

It is currently possible for a PHY device to be suspended as part of a
network device driver's suspend call while it is still being attached to
that net_device, either via phy_suspend() or implicitly via phy_stop().

Later on, when the MDIO bus controller get suspended, we would attempt
to suspend again the PHY because it is still attached to a network
device.

This is both a waste of time and creates an opportunity for improper
clock/power management bugs to creep in.

Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY")
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller

Florian Fainelli
2020-02-24 12:57:50 +0800
44343418d net: ks8851-ml: Fix IRQ handling and locking ... Browse Code »

The KS8851 requires that packet RX and TX are mutually exclusive.
Currently, the driver hopes to achieve this by disabling interrupt
from the card by writing the card registers and by disabling the
interrupt on the interrupt controller. This however is racy on SMP.

Replace this approach by expanding the spinlock used around the
ks_start_xmit() TX path to ks_irq() RX path to assure true mutual
exclusion and remove the interrupt enabling/disabling, which is
now not needed anymore. Furthermore, disable interrupts also in
ks_net_stop(), which was missing before.

Note that a massive improvement here would be to re-use the KS8851
driver approach, which is to move the TX path into a worker thread,
interrupt handling to threaded interrupt, and synchronize everything
with mutexes, but that would be a much bigger rework, for a separate
patch.

Signed-off-by: Marek Vasut
Cc: David S. Miller
Cc: Lukas Wunner
Cc: Petr Stetiar
Cc: YueHaibing
Signed-off-by: David S. Miller

Marek Vasut
2020-02-24 12:53:42 +0800
52df1e564 docs: networking: phy: Rephrase paragraph for clarity ... Browse Code »

Let's make it a little easier to read.

Signed-off-by: Jonathan Neuschäfer
Signed-off-by: David S. Miller

Jonathan Neuschäfer
2020-02-24 12:42:47 +0800
dad8cea7a tcp: fix TFO SYNACK undo to avoid double-timestamp-undo ... Browse Code »

In a rare corner case the new logic for undo of SYNACK RTO could
result in triggering the warning in tcp_fastretrans_alert() that says:
WARN_ON(tp->retrans_out != 0);

The warning looked like:

WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270

The sequence that tickles this bug is:
- Fast Open server receives TFO SYN with data, sends SYNACK
- (client receives SYNACK and sends ACK, but ACK is lost)
- server app sends some data packets
- (N of the first data packets are lost)
- server receives client ACK that has a TS ECR matching first SYNACK,
and also SACKs suggesting the first N data packets were lost
- server performs TS undo of SYNACK RTO, then immediately
enters recovery
- buggy behavior then performed a *second* undo that caused
the connection to be in CA_Open with retrans_out != 0

Basically, the incoming ACK packet with SACK blocks causes us to first
undo the cwnd reduction from the SYNACK RTO, but then immediately
enters fast recovery, which then makes us eligible for undo again. And
then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo
using a "mash-up" of state from two different loss recovery phases: it
uses the timestamp info from the ACK of the original SYNACK, and the
undo_marker from the fast recovery.

This fix refines the logic to only invoke the tcp_try_undo_loss()
inside tcp_rcv_synrecv_state_fastopen() if the connection is still in
CA_Loss. If peer SACKs triggered fast recovery, then
tcp_rcv_synrecv_state_fastopen() can't safely undo.

Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Neal Cardwell
2020-02-24 09:23:35 +0800
f6f13c125 hv_netvsc: Fix unwanted wakeup in netvsc_attach() ... Browse Code »

When netvsc_attach() is called by operations like changing MTU, etc.,
an extra wakeup may happen while netvsc_attach() calling
rndis_filter_device_add() which sends rndis messages when queue is
stopped in netvsc_detach(). The completion message will wake up queue 0.

We can reproduce the issue by changing MTU etc., then the wake_queue
counter from "ethtool -S" will increase beyond stop_queue counter:
stop_queue: 0
wake_queue: 1
The issue causes queue wake up, and counter increment, no other ill
effects in current code. So we didn't see any network problem for now.

To fix this, initialize tx_disable to true, and set it to false when
the NIC is ready to be attached or registered.

Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang
Signed-off-by: David S. Miller

Haiyang Zhang
2020-02-24 08:32:37 +0800
eae7172f8 net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch ... Browse Code »

usbnet creates network interfaces with min_mtu = 0 and
max_mtu = ETH_MAX_MTU.

These values are not modified by qmi_wwan when the network interface
is created initially, allowing, for example, to set mtu greater than 1500.

When a raw_ip switch is done (raw_ip set to 'Y', then set to 'N') the mtu
values for the network interface are set through ether_setup, with
min_mtu = ETH_MIN_MTU and max_mtu = ETH_DATA_LEN, not allowing anymore to
set mtu greater than 1500 (error: mtu greater than device maximum).

The patch restores the original min/max mtu values set by usbnet after a
raw_ip switch.

Signed-off-by: Daniele Palmas
Acked-by: Bjørn Mork
Signed-off-by: David S. Miller

Daniele Palmas
2020-02-24 08:13:50 +0800

23 Feb, 2020

3 commits

39f3b41aa net: genetlink: return the error code when attribute parsing fails. ... Browse Code »

Currently if attribute parsing fails and the genl family
does not support parallel operation, the error code returned
by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse().

Be sure to report the error for all genl families.

Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function")
Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops")
Signed-off-by: Paolo Abeni
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Paolo Abeni
2020-02-23 13:58:33 +0800
3e72dfdf8 ipv4: ensure rcu_read_lock() in cipso_v4_error() ... Browse Code »

Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in
ipv4_link_failure()"), __ip_options_compile() must be called under rcu
protection.

Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
Suggested-by: Guillaume Nault
Signed-off-by: Matteo Croce
Acked-by: Paul Moore
Signed-off-by: David S. Miller

Matteo Croce
2020-02-23 13:45:55 +0800
42d84c849 vhost: Check docket sk_family instead of call getname ... Browse Code »

Doing so, we save one call to get data we already have in the struct.

Also, since there is no guarantee that getname use sockaddr_ll
parameter beyond its size, we add a little bit of security here.
It should do not do beyond MAX_ADDR_LEN, but syzbot found that
ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25,
versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).

Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com
Signed-off-by: Eugenio Pérez
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Eugenio Pérez
2020-02-23 13:41:42 +0800

22 Feb, 2020

9 commits

8af1c6fbd netfilter: ipset: Fix forceadd evaluation path ... Browse Code »

When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.

Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2020-02-22 19:13:45 +0800
f66ee0410 netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports ... Browse Code »

In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.

There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:

- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.

Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.

Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2020-02-22 19:00:06 +0800
0c0ddd6ae Merge tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog ... Browse Code »

Pull watchdog fixes from Wim Van Sebroeck:

- mtk_wdt needs RESET_CONTROLLER to build

- da9062 driver fixes:
- fix power management ops
- do not ping the hw during stop()
- add dependency on I2C

* tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: da9062: Add dependency on I2C
watchdog: da9062: fix power management ops
watchdog: da9062: do not ping the hw during stop()
watchdog: fix mtk_wdt.c RESET_CONTROLLER build error

Linus Torvalds
2020-02-22 05:02:49 +0800
bb65619e9 Merge tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc ... Browse Code »

Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.6-rc3.

Also included in here are some updates for some documentation files
that I seem to be maintaining these days.

The driver fixes are:
- small fixes for the habanalabs driver
- fsi driver bugfix

All of these have been in linux-next for a while with no reported
issues"

* tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Documentation/process: Swap out the ambassador for Canonical
habanalabs: patched cb equals user cb in device memset
habanalabs: do not halt CoreSight during hard reset
habanalabs: halt the engines before hard-reset
MAINTAINERS: remove unnecessary ':' characters
fsi: aspeed: add unspecified HAS_IOMEM dependency
COPYING: state that all contributions really are covered by this file
Documentation/process: Change Microsoft contact for embargoed hardware issues
embargoed-hardware-issues: drop Amazon contact as the email address now bounces
Documentation/process: Add Arm contact for embargoed HW issues

Linus Torvalds
2020-02-22 04:57:05 +0800
e5553ac71 Merge tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging ... Browse Code »

Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.6-rc3, along with the
removal of an unused/unneeded driver as well.

The android vsoc driver is not needed anymore by anyone, so it was
removed.

The other driver fixes are:
- ashmem bugfixes
- greybus audio driver bugfix
- wireless driver bugfixes and tiny cleanups to error paths

All of these have been in linux-next for a while now with no reported
issues"

* tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723bs: Remove unneeded goto statements
staging: rtl8188eu: Remove some unneeded goto statements
staging: rtl8723bs: Fix potential overuse of kernel memory
staging: rtl8188eu: Fix potential overuse of kernel memory
staging: rtl8723bs: Fix potential security hole
staging: rtl8188eu: Fix potential security hole
staging: greybus: use after free in gb_audio_manager_remove_all()
staging: android: Delete the 'vsoc' driver
staging: rtl8723bs: fix copy of overlapping memory
staging: android: ashmem: Disallow ashmem memory from being remapped
staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.

Linus Torvalds
2020-02-22 04:53:53 +0800
ef11f1b76 Merge tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 5.6-rc3
that resolve a bunch of reported issues.

They are:
- vt selection and ioctl fixes
- serdev bugfix
- atmel serial driver fixes
- qcom serial driver fixes
- other minor serial driver fixes

All of these have been in linux-next for a while with no reported
issues"

* tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: selection, close sel_buffer race
vt: selection, handle pending signals in paste_selection
serial: cpm_uart: call cpm_muram_init before registering console
tty: serial: qcom_geni_serial: Fix RX cancel command failure
serial: 8250: Check UPF_IRQ_SHARED in advance
tty: serial: imx: setup the correct sg entry for tx dma
vt: vt_ioctl: fix race in VT_RESIZEX
vt: fix scrollback flushing on background consoles
tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started
tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode
serdev: ttyport: restore client ops on deregistration
serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE

Linus Torvalds
2020-02-22 04:48:29 +0800
cee853e82 Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb ... Browse Code »

Pull USB/Thunderbolt fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.6-rc3.

Included in here are:
- MAINTAINER file updates
- USB gadget driver fixes
- usb core quirk additions and fixes for regressions
- xhci driver fixes
- usb serial driver id additions and fixes
- thunderbolt bugfix

Thunderbolt patches come in through here now that USB4 is really
thunderbolt.

All of these have been in linux-next for a while with no reported
issues"

* tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits)
USB: misc: iowarrior: add support for the 100 device
thunderbolt: Prevent crash if non-active NVMem file is read
usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format
USB: misc: iowarrior: add support for the 28 and 28L devices
USB: misc: iowarrior: add support for 2 OEMed devices
USB: Fix novation SourceControl XL after suspend
xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2
Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables"
MAINTAINERS: Sort entries in database for THUNDERBOLT
usb: dwc3: debug: fix string position formatting mixup with ret and len
usb: gadget: serial: fix Tx stall after buffer overflow
usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows
usb: dwc2: Fix in ISOC request length checking
usb: gadget: composite: Support more than 500mA MaxPower
usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
usb: gadget: u_audio: Fix high-speed max packet size
usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields
USB: core: clean up endpoint-descriptor parsing
USB: quirks: blacklist duplicate ep on Sound Devices USBPre2
...

Linus Torvalds
2020-02-22 04:44:53 +0800
88f8bbfa9 Merge tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm ... Browse Code »

Pull drm fixes from Dave Airlie:
"Varied fixes for rc3.

i915 is the largest, they are seeing some ACPI problems with their CI
which hopefully get solved soon [1].

msm has a bunch of fixes for new hw added in the merge, a bunch of
amdgpu fixes, and nouveau adds support for some new firmwares for
turing tu11x GPUs that were just released into linux-firmware by
nvidia, they operate the same as the ones we already have for tu10x so
should be fine to hook up.

Otherwise it's just misc fixes for panfrost and sun4i.

core:
- Allow only one rotation argument, and allow zero rotation in video
cmdline.

i915:
- Workaround missing Display Stream Compression (DSC) state readout
by forcing modeset when its enabled at probe
- Fix EHL port clock voltage level requirements
- Fix queuing retire workers on the virtual engine
- Fix use of partially initialized waiters
- Stop using drm_pci_alloc/drm_pci/free
- Fix rewind of RING_TAIL by forcing a context reload
- Fix locking on resetting ring->head
- Propagate our bug filing URL change to stable kernels

panfrost:
- Small compiler warning fix for panfrost.
- Fix when using performance counters in panfrost when using per fd
address space.

sun4xi:
- Fix dt binding

nouveau:
- tu11x modesetting fix
- ACR/GR firmware support for tu11x (fw is public now)

msm:
- fix UBWC on GPU and display side for sc7180
- fix DSI suspend/resume issue encountered on sc7180
- fix some breakage on so called "linux-android" devices
(fallout from sc7180/a618 support, not seen earlier due to
bootloader/firmware differences)
- couple other misc fixes

amdgpu:
- HDCP fixes
- xclk fix for raven
- GFXOFF fixes"

[1] The Intel suspend testing should now be fixed by commit 63fb9623427f
("ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()")

* tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm: (39 commits)
drm/amdgpu/display: clean up hdcp workqueue handling
drm/amdgpu: add is_raven_kicker judgement for raven1
drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex
drm/i915/execlists: Always force a context reload when rewinding RING_TAIL
drm/i915: Wean off drm_pci_alloc/drm_pci_free
drm/i915/gt: Protect defer_request() from new waiters
drm/i915/gt: Prevent queuing retire workers on the virtual engine
drm/i915/dsc: force full modeset whenever DSC is enabled at probe
drm/i915/ehl: Update port clock voltage level requirements
drm/i915: Update drm/i915 bug filing URL
MAINTAINERS: Update drm/i915 bug filing URL
drm/i915: Initialise basic fence before acquiring seqno
drm/i915/gem: Require per-engine reset support for non-persistent contexts
drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets
drm/nouveau/gr/tu11x: initial support
drm/nouveau/acr/tu11x: initial support
drm/amdgpu/gfx10: disable gfxoff when reading rlc clock
drm/amdgpu/gfx9: disable gfxoff when reading rlc clock
drm/amdgpu/soc15: fix xclk for raven
drm/amd/powerplay: always refetch the enabled features status on dpm enablement
...

Linus Torvalds
2020-02-22 04:18:02 +0800
3dc55dba6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from David Miller:

1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from
Cong Wang.

2) Fix deadlock in xsk by publishing global consumer pointers when NAPI
is finished, from Magnus Karlsson.

3) Set table field properly to RT_TABLE_COMPAT when necessary, from
Jethro Beekman.

4) NLA_STRING attributes are not necessary NULL terminated, deal wiht
that in IFLA_ALT_IFNAME. From Eric Dumazet.

5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov.

6) Handle mtu==0 devices properly in wireguard, from Jason A.
Donenfeld.

7) Fix several lockdep warnings in bonding, from Taehee Yoo.

8) Fix cls_flower port blocking, from Jason Baron.

9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen.

10) Fix RDMA race in qede driver, from Michal Kalderon.

11) Fix several false lockdep warnings by adding conditions to
list_for_each_entry_rcu(), from Madhuparna Bhowmik.

12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen.

13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song.

14) Hey, variables declared in switch statement before any case
statements are not initialized. I learn something every day. Get
rids of this stuff in several parts of the networking, from Kees
Cook.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.
bnxt_en: Improve device shutdown method.
net: netlink: cap max groups which will be considered in netlink_bind()
net: thunderx: workaround BGX TX Underflow issue
ionic: fix fw_status read
net: disable BRIDGE_NETFILTER by default
net: macb: Properly handle phylink on at91rm9200
s390/qeth: fix off-by-one in RX copybreak check
s390/qeth: don't warn for napi with 0 budget
s390/qeth: vnicc Fix EOPNOTSUPP precedence
openvswitch: Distribute switch variables for initialization
net: ip6_gre: Distribute switch variables for initialization
net: core: Distribute switch variables for initialization
udp: rehash on disconnect
net/tls: Fix to avoid gettig invalid tls record
bpf: Fix a potential deadlock with bpf_map_do_batch
bpf: Do not grab the bucket spinlock by default on htab batch ops
ice: Wait for VF to be reset/ready before configuration
ice: Don't tell the OS that link is going down
ice: Don't reject odd values of usecs set by user
...

Linus Torvalds
2020-02-22 03:59:51 +0800