Eric Lee / smarc-fsl-linux-kernel

20 Nov, 2020

1 commit

a24d22b22 crypto: sha - split sha.h into sha1.h and sha2.h ... Browse Code »

Currently contains declarations for both SHA-1 and SHA-2,
and contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split into two headers and
, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
Acked-by: Jason A. Donenfeld
Signed-off-by: Herbert Xu

Eric Biggers
2020-11-20 11:45:33 +0800

26 Oct, 2020

1 commit

23224e450 mm: remove kzfree() compatibility definition ... Browse Code »

Commit 453431a54934 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.

Since then a few more instances of kzfree() have slipped in.

Just get rid of them and remove the compatibility definition
once and for all.

Signed-off-by: Eric Biggers
Signed-off-by: Linus Torvalds

Eric Biggers
2020-10-26 02:39:02 +0800

25 Oct, 2020

1 commit

3744741ad random32: add noise from network and scheduling activity ... Browse Code »

With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.

This patch restores the spirit of commit f227e3ec3b5c ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.

The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin
Cc: Amit Klein
Cc: Eric Dumazet
Cc: "Jason A. Donenfeld"
Cc: Andy Lutomirski
Cc: Kees Cook
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: tytso@mit.edu
Cc: Florian Westphal
Cc: Marc Plumb
Tested-by: Sedat Dilek
Signed-off-by: Willy Tarreau

Willy Tarreau
2020-10-25 02:21:57 +0800

24 Oct, 2020

1 commit

3cb12d27f Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from Jakub Kicinski:
"Cross-tree/merge window issues:

- rtl8150: don't incorrectly assign random MAC addresses; fix late in
the 5.9 cycle started depending on a return code from a function
which changed with the 5.10 PR from the usb subsystem

Current release regressions:

- Revert "virtio-net: ethtool configurable RXCSUM", it was causing
crashes at probe when control vq was not negotiated/available

Previous release regressions:

- ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
bus, only first device would be probed correctly

- nexthop: Fix performance regression in nexthop deletion by
effectively switching from recently added synchronize_rcu() to
synchronize_rcu_expedited()

- netsec: ignore 'phy-mode' device property on ACPI systems; the
property is not populated correctly by the firmware, but firmware
configures the PHY so just keep boot settings

Previous releases - always broken:

- tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
bulk transfers getting "stuck"

- icmp: randomize the global rate limiter to prevent attackers from
getting useful signal

- r8169: fix operation under forced interrupt threading, make the
driver always use hard irqs, even on RT, given the handler is light
and only wants to schedule napi (and do so through a _irqoff()
variant, preferably)

- bpf: Enforce pointer id generation for all may-be-null register
type to avoid pointers erroneously getting marked as null-checked

- tipc: re-configure queue limit for broadcast link

- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
tunnels

- fix various issues in chelsio inline tls driver

Misc:

- bpf: improve just-added bpf_redirect_neigh() helper api to support
supplying nexthop by the caller - in case BPF program has already
done a lookup we can avoid doing another one

- remove unnecessary break statements

- make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
tcp: fix to update snd_wl1 in bulk receiver fast path
net: Properly typecast int values to set sk_max_pacing_rate
netfilter: nf_fwd_netdev: clear timestamp in forwarding path
ibmvnic: save changed mac address to adapter->mac_addr
selftests: mptcp: depends on built-in IPv6
Revert "virtio-net: ethtool configurable RXCSUM"
rtnetlink: fix data overflow in rtnl_calcit()
net: ethernet: mtk-star-emac: select REGMAP_MMIO
net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
mptcp: depends on IPV6 but not as a module
sfc: move initialisation of efx->filter_sem to efx_init_struct()
mpls: load mpls_gso after mpls_iptunnel
net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
net: dsa: bcm_sf2: make const array static, makes object smaller
mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
...

Linus Torvalds
2020-10-24 03:05:49 +0800

23 Oct, 2020

6 commits

18ded910b tcp: fix to update snd_wl1 in bulk receiver fast path ... Browse Code »

In the header prediction fast path for a bulk data receiver, if no
data is newly acknowledged then we do not call tcp_ack() and do not
call tcp_ack_update_window(). This means that a bulk receiver that
receives large amounts of data can have the incoming sequence numbers
wrap, so that the check in tcp_may_update_window fails:
after(ack_seq, tp->snd_wl1)

If the incoming receive windows are zero in this state, and then the
connection that was a bulk data receiver later wants to send data,
that connection can find itself persistently rejecting the window
updates in incoming ACKs. This means the connection can persistently
fail to discover that the receive window has opened, which in turn
means that the connection is unable to send anything, and the
connection's sending process can get permanently "stuck".

The fix is to update snd_wl1 in the header prediction fast path for a
bulk data receiver, so that it keeps up and does not see wrapping
problems.

This fix is based on a very nice and thorough analysis and diagnosis
by Apollon Oikonomopoulos (see link below).

This is a stable candidate but there is no Fixes tag here since the
bug predates current git history. Just for fun: looks like the bug
dates back to when header prediction was added in Linux v2.1.8 in Nov
1996. In that version tcp_rcv_established() was added, and the code
only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
receiver" code path it does not call tcp_ack(). This fix seems to
apply cleanly at least as far back as v3.2.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Neal Cardwell
Reported-by: Apollon Oikonomopoulos
Tested-by: Apollon Oikonomopoulos
Link: https://www.spinics.net/lists/netdev/msg692430.html
Acked-by: Soheil Hassas Yeganeh
Acked-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Link: https://lore.kernel.org/r/20201022143331.1887495-1-ncardwell.kernel@gmail.com
Signed-off-by: Jakub Kicinski

Neal Cardwell
2020-10-23 03:26:57 +0800
700465fd3 net: Properly typecast int values to set sk_max_pacing_rate ... Browse Code »

In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate,
after extended from 'u32' to 'unsigned long', takes unintentionally
hiked value whenever assigned from an 'int' value with MSB=1, due to
binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes
0xFFFFFFFF80000000.

Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return
~0U unexpectedly. It may also result in increased pacing rate.

Fix by explicitly casting the 'int' value to 'unsigned int' before
assigning it to sk_max_pacing_rate, for zero extension to happen.

Fixes: 76a9ebe811fb ("net: extend sk_pacing_rate to unsigned long")
Signed-off-by: Ji Li
Signed-off-by: Ke Li
Reviewed-by: Eric Dumazet
Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.com
Signed-off-by: Jakub Kicinski

Ke Li
2020-10-23 03:18:25 +0800
594850ca4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Update debugging in IPVS tcp protocol handler to make it easier
to understand, from longguang.yue

2) Update TCP tracker to deal with keepalive packet after
re-registration, from Franceso Ruggeri.

3) Missing IP6SKB_FRAGMENTED from netfilter fragment reassembly,
from Georg Kohmann.

4) Fix bogus packet drop in ebtables nat extensions, from
Thimothee Cocault.

5) Fix typo in flowtable documentation.

6) Reset skb timestamp in nft_fwd_netdev.
====================

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2020-10-23 03:00:37 +0800
d2775984d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf ... Browse Code »

Daniel Borkmann says:

====================
pull-request: bpf 2020-10-22

1) Fix enforcing NULL check in verifier for new helper return types of
RET_PTR_TO_{BTF_ID,MEM_OR_BTF_ID}_OR_NULL, from Martin KaFai Lau.

2) Fix bpf_redirect_neigh() helper API before it becomes frozen by adding
nexthop information as argument, from Toke Høiland-Jørgensen.

3) Guard & fix compilation of bpf_tail_call_static() when __bpf__ arch is
not defined by compiler or clang too old, from Daniel Borkmann.

4) Remove misplaced break after return in attach_type_to_prog_type(), from
Tom Rix.
====================

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2020-10-23 00:51:41 +0800
24717cfbb Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"The one new feature this time, from Anna Schumaker, is READ_PLUS,
which has the same arguments as READ but allows the server to return
an array of data and hole extents.

Otherwise it's a lot of cleanup and bugfixes"

* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
sunrpc: raise kernel RPC channel buffer size
svcrdma: fix bounce buffers for unaligned offsets and multiple pages
nfsd: remove unneeded break
net/sunrpc: Fix return value for sysctl sunrpc.transports
NFSD: Encode a full READ_PLUS reply
NFSD: Return both a hole and a data segment
NFSD: Add READ_PLUS hole segment encoding
NFSD: Add READ_PLUS data support
NFSD: Hoist status code encoding into XDR encoder functions
NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
NFSD: Remove the RETURN_STATUS() macro
NFSD: Call NFSv2 encoders on error returns
NFSD: Fix .pc_release method for NFSv2
NFSD: Remove vestigial typedefs
NFSD: Refactor nfsd_dispatch() error paths
NFSD: Clean up nfsd_dispatch() variables
NFSD: Clean up stale comments in nfsd_dispatch()
NFSD: Clean up switch statement in nfsd_dispatch()
...

Linus Torvalds
2020-10-23 00:44:27 +0800
334d431f6 Merge tag '9p-for-5.10-rc1' of git://github.com/martinetd/linux ... Browse Code »

Pull 9p updates from Dominique Martinet:
"A couple of small fixes (loff_t overflow on 32bit, syzbot
uninitialized variable warning) and code cleanup (xen)"

* tag '9p-for-5.10-rc1' of git://github.com/martinetd/linux:
net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid
9p/xen: Fix format argument warning
9P: Cast to loff_t before multiplying

Linus Torvalds
2020-10-23 00:33:20 +0800

22 Oct, 2020

4 commits

c77761c8a netfilter: nf_fwd_netdev: clear timestamp in forwarding path ... Browse Code »

Similar to 7980d2eabde8 ("ipvs: clear skb->tstamp in forwarding path").
fq qdisc requires tstamp to be cleared in forwarding path.

Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths")
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-10-22 20:49:36 +0800
ebfe3c518 rtnetlink: fix data overflow in rtnl_calcit() ... Browse Code »

"ip addr show" command execute error when we have a physical
network card with a large number of VFs

The return value of if_nlmsg_size() in rtnl_calcit() will exceed
range of u16 data type when any network cards has a larger number of
VFs. rtnl_vfinfo_size() will significant increase needed dump size when
the value of num_vfs is larger.

Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
which decides the memory size needed by netlink dump and netlink_dump()
will return -EMSGSIZE because of not enough memory was allocated.

So fix it by promoting min_dump_alloc data type to u32 to
avoid whole netlink message size overflow and it's also align
with the data type of struct netlink_callback{}.min_dump_alloc
which is assigned by return value of rtnl_calcit()

Signed-off-by: Di Zhu
Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.com
Signed-off-by: Jakub Kicinski

Di Zhu
2020-10-22 09:24:08 +0800
ba452c9e9 bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop ... Browse Code »

Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply
be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen.

[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Daniel Borkmann
Reviewed-by: David Ahern
Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk

Toke Høiland-Jørgensen
2020-10-22 07:28:54 +0800
ed7cfefe4 Merge tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:

- a patch that removes crush_workspace_mutex (myself). CRUSH
computations are no longer serialized and can run in parallel.

- a couple new filesystem client metrics for "ceph fs top" command
(Xiubo Li)

- a fix for a very old messenger bug that affected the filesystem,
marked for stable (myself)

- assorted fixups and cleanups throughout the codebase from Jeff and
others.

* tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits)
libceph: clear con->out_msg on Policy::stateful_server faults
libceph: format ceph_entity_addr nonces as unsigned
libceph: fix ENTITY_NAME format suggestion
libceph: move a dout in queue_con_delay()
ceph: comment cleanups and clarifications
ceph: break up send_cap_msg
ceph: drop separate mdsc argument from __send_cap
ceph: promote to unsigned long long before shifting
ceph: don't SetPageError on readpage errors
ceph: mark ceph_fmt_xattr() as printf-like for better type checking
ceph: fold ceph_update_writeable_page into ceph_write_begin
ceph: fold ceph_sync_writepages into writepage_nounlock
ceph: fold ceph_sync_readpages into ceph_readpage
ceph: don't call ceph_update_writeable_page from page_mkwrite
ceph: break out writeback of incompatible snap context to separate function
ceph: add a note explaining session reject error string
libceph: switch to the new "osd blocklist add" command
libceph, rbd, ceph: "blacklist" -> "blocklist"
ceph: have ceph_writepages_start call pagevec_lookup_range_tag
ceph: use kill_anon_super helper
...

Linus Torvalds
2020-10-22 01:34:10 +0800

21 Oct, 2020

13 commits

0ed37ac58 mptcp: depends on IPV6 but not as a module ... Browse Code »

Like TCP, MPTCP cannot be compiled as a module. Obviously, MPTCP IPv6'
support also depends on CONFIG_IPV6. But not all functions from IPv6
code are exported.

To simplify the code and reduce modifications outside MPTCP, it was
decided from the beginning to support MPTCP with IPv6 only if
CONFIG_IPV6 was built inlined. That's also why CONFIG_MPTCP_IPV6 was
created. More modifications are needed to support CONFIG_IPV6=m.

Even if it was not explicit, until recently, we were forcing CONFIG_IPV6
to be built-in because we had "select IPV6" in Kconfig. Now that we have
"depends on IPV6", we have to explicitly set "IPV6=y" to force
CONFIG_IPV6 not to be built as a module.

In other words, we can now only have CONFIG_MPTCP_IPV6=y if
CONFIG_IPV6=y.

Note that the new dependency might hide the fact IPv6 is not supported
in MPTCP even if we have CONFIG_IPV6=m. But selecting IPV6 like we did
before was forcing it to be built-in while it was maybe not what the
user wants.

Reported-by: Geert Uytterhoeven
Fixes: 010b430d5df5 ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it")
Signed-off-by: Matthieu Baerts
Link: https://lore.kernel.org/r/20201021105154.628257-1-matthieu.baerts@tessares.net
Signed-off-by: Jakub Kicinski

Matthieu Baerts
2020-10-21 23:05:40 +0800
b7c24497b mpls: load mpls_gso after mpls_iptunnel ... Browse Code »

mpls_iptunnel is used only for mpls encapsuation, and if encaplusated
packet is larger than MTU we need mpls_gso for segmentation.

Signed-off-by: Alexander Ovechkin
Acked-by: Dmitry Yakunin
Reviewed-by: David Ahern
Link: https://lore.kernel.org/r/20201020114333.26866-1-ovov@yandex-team.ru
Signed-off-by: Jakub Kicinski

Alexander Ovechkin
2020-10-21 12:16:45 +0800
a7a12b5a0 net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels ... Browse Code »

the following command

# tc action add action tunnel_key \
> set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0

generates the following splat:

BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
Write of size 4 at addr ffff88813f5f1cc8 by task tc/873

CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
dump_stack+0x99/0xcb
print_address_description.constprop.7+0x1e/0x230
kasan_report.cold.13+0x37/0x7c
tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
tunnel_key_init+0x160c/0x1f40 [act_tunnel_key]
tcf_action_init_1+0x5b5/0x850
tcf_action_init+0x15d/0x370
tcf_action_add+0xd9/0x2f0
tc_ctl_action+0x29b/0x3a0
rtnetlink_rcv_msg+0x341/0x8d0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x439/0x630
netlink_sendmsg+0x719/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5ba/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f872a96b338
Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338
RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c
R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000

Allocated by task 873:
kasan_save_stack+0x19/0x40
__kasan_kmalloc.constprop.7+0xc1/0xd0
__kmalloc+0x151/0x310
metadata_dst_alloc+0x20/0x40
tunnel_key_init+0xfff/0x1f40 [act_tunnel_key]
tcf_action_init_1+0x5b5/0x850
tcf_action_init+0x15d/0x370
tcf_action_add+0xd9/0x2f0
tc_ctl_action+0x29b/0x3a0
rtnetlink_rcv_msg+0x341/0x8d0
netlink_rcv_skb+0x120/0x380
netlink_unicast+0x439/0x630
netlink_sendmsg+0x719/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5ba/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88813f5f1c00
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 200 bytes inside of
256-byte region [ffff88813f5f1c00, ffff88813f5f1d00)
The buggy address belongs to the page:
page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0
head:0000000011b48a19 order:1 compound_mapcount:0
flags: 0x17ffffc0010200(slab|head)
raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
^
ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
the tunnel metadata, but then it expects additional bytes to store tunnel
specific metadata with tunnel_key_copy_opts().

Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
size previously computed by tunnel_key_get_opts_len(), like it's done for
IPv4 tunnels.

Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
Reported-by: Shuang Li
Signed-off-by: Davide Caratti
Acked-by: Cong Wang
Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski

Davide Caratti
2020-10-21 12:10:41 +0800
b13076216 net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action() ... Browse Code »

We need to jump to the "err_out_locked" label when
tcf_gate_get_entries() fails. Otherwise, tc_setup_flow_action() exits
with ->tcfa_lock still held.

Fixes: d29bdd69ecdd ("net: schedule: add action gate offloading")
Signed-off-by: Guillaume Nault
Acked-by: Cong Wang
Link: https://lore.kernel.org/r/12f60e385584c52c22863701c0185e40ab08a7a7.1603207948.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski

Guillaume Nault
2020-10-21 12:00:52 +0800
010b430d5 mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it ... Browse Code »

MPTCP_IPV6 selects IPV6, thus enabling an optional feature the user may
not want to enable. Fix this by making MPTCP_IPV6 depend on IPV6, like
is done for all other IPv6 features.

Fixes: f870fa0b5768842c ("mptcp: Add MPTCP socket stubs")
Signed-off-by: Geert Uytterhoeven
Reviewed-by: Matthieu Baerts
Link: https://lore.kernel.org/r/20201020073839.29226-1-geert@linux-m68k.org
Signed-off-by: Jakub Kicinski

Geert Uytterhoeven
2020-10-21 11:54:08 +0800
280e3ebda nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download() ... Browse Code »

Check that the NFC_ATTR_FIRMWARE_NAME attributes are provided by
the netlink client prior to accessing them.This prevents potential
unhandled NULL pointer dereference exceptions which can be triggered
by malicious user-mode programs, if they omit one or both of these
attributes.

Similar to commit a0323b979f81 ("nfc: Ensure presence of required attributes in the activate_target handler").

Fixes: 9674da8759df ("NFC: Add firmware upload netlink command")
Signed-off-by: Defang Bo
Link: https://lore.kernel.org/r/1603107538-4744-1-git-send-email-bodefang@126.com
Signed-off-by: Jakub Kicinski

Defang Bo
2020-10-21 08:06:22 +0800
b142083b5 mptcp: MPTCP_KUNIT_TESTS should depend on MPTCP instead of selecting it ... Browse Code »

MPTCP_KUNIT_TESTS selects MPTCP, thus enabling an optional feature the
user may not want to enable. Fix this by making the test depend on
MPTCP instead.

Fixes: a00a582203dbc43e ("mptcp: move crypto test to KUNIT")
Signed-off-by: Geert Uytterhoeven
Reviewed-by: Matthieu Baerts
Link: https://lore.kernel.org/r/20201019113240.11516-1-geert@linux-m68k.org
Signed-off-by: Jakub Kicinski

Geert Uytterhoeven
2020-10-21 07:50:06 +0800
65b8c8a62 mptcp: move mptcp_options_received's port initialization ... Browse Code »

Move mptcp_options_received's port initialization from
mptcp_parse_option to mptcp_get_options, put it together with
the other fields initializations of mptcp_options_received.

Signed-off-by: Geliang Tang
Reviewed-by: Matthieu Baerts
Signed-off-by: Jakub Kicinski

Geliang Tang
2020-10-21 07:32:36 +0800
fe2d9b1a0 mptcp: initialize mptcp_options_received's ahmac ... Browse Code »

Initialize mptcp_options_received's ahmac to zero, otherwise it
will be a random number when receiving ADD_ADDR suboption with echo-flag=1.

Fixes: 3df523ab582c5 ("mptcp: Add ADD_ADDR handling")
Signed-off-by: Geliang Tang
Reviewed-by: Matthieu Baerts
Signed-off-by: Jakub Kicinski

Geliang Tang
2020-10-21 07:32:14 +0800
47b5d2a10 net/sched: act_ct: Fix adding udp port mangle operation ... Browse Code »

Need to use the udp header type and not tcp.

Fixes: 9c26ba9b1f45 ("net/sched: act_ct: Instantiate flow table entry actions")
Signed-off-by: Roi Dayan
Reviewed-by: Paul Blakey
Link: https://lore.kernel.org/r/20201019090244.3015186-1-roid@nvidia.com
Signed-off-by: Jakub Kicinski

Roi Dayan
2020-10-21 07:15:51 +0800
59f0e7eb2 Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs ... Browse Code »

Pull NFS client updates from Anna Schumaker:
"Stable Fixes:
- Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+
- Fix nfs_path in case of a rename retry
- Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag

New features and improvements:
- Replace dprintk() calls with tracepoints
- Make cache consistency bitmap dynamic
- Added support for the NFS v4.2 READ_PLUS operation
- Improvements to net namespace uniquifier

Other bugfixes and cleanups:
- Remove redundant clnt pointer
- Don't update timeout values on connection resets
- Remove redundant tracepoints
- Various cleanups to comments
- Fix oops when trying to use copy_file_range with v4.0 source server
- Improvements to flexfiles mirrors
- Add missing 'local_lock=posix' mount option"

* tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits)
NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag
NFSv4: Fix up RCU annotations for struct nfs_netns_client
NFS: Only reference user namespace from nfs4idmap struct instead of cred
nfs: add missing "posix" local_lock constant table definition
NFSv4: Use the net namespace uniquifier if it is set
NFSv4: Clean up initialisation of uniquified client id strings
NFS: Decode a full READ_PLUS reply
SUNRPC: Add an xdr_align_data() function
NFS: Add READ_PLUS hole segment decoding
SUNRPC: Add the ability to expand holes in data pages
SUNRPC: Split out _shift_data_right_tail()
SUNRPC: Split out xdr_realign_pages() from xdr_align_pages()
NFS: Add READ_PLUS data segment support
NFS: Use xdr_page_pos() in NFSv4 decode_getacl()
SUNRPC: Implement a xdr_page_pos() function
SUNRPC: Split out a function for setting current page
NFS: fix nfs_path in case of a rename retry
fs: nfs: return per memcg count for xattr shrinkers
NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE
nfs: remove incorrect fallthrough label
...

Linus Torvalds
2020-10-21 04:26:30 +0800
d48c81247 SUNRPC: fix copying of multiple pages in gss_read_proxy_verf() ... Browse Code »

When the passed token is longer than 4032 bytes, the remaining part
of the token must be copied from the rqstp->rq_arg.pages. But the
copy must make sure it happens in a consecutive way.

With the existing code, the first memcpy copies 'length' bytes from
argv->iobase, but since the header is in front, this never fills the
whole first page of in_token->pages.

The mecpy in the loop copies the following bytes, but starts writing at
the next page of in_token->pages. This leaves the last bytes of page 0
unwritten.

Symptoms were that users with many groups were not able to access NFS
exports, when using Active Directory as the KDC.

Signed-off-by: Martijn de Gouw
Fixes: 5866efa8cbfb "SUNRPC: Fix svcauth_gss_proxy_init()"
Signed-off-by: J. Bruce Fields

Martijn de Gouw
2020-10-21 01:21:30 +0800
27a1e8a0f sunrpc: raise kernel RPC channel buffer size ... Browse Code »

Its possible that using AUTH_SYS and mountd manage-gids option a
user may hit the 8k RPC channel buffer limit. This have been observed
on field, causing unanswered RPCs on clients after mountd fails to
write on channel :

rpc.mountd[11231]: auth_unix_gid: error writing reply

Userland nfs-utils uses a buffer size of 32k (RPC_CHAN_BUF_SIZE), so
lets match those two.

Signed-off-by: Roberto Bergantinos Corpas
Signed-off-by: J. Bruce Fields

Roberto Bergantinos Corpas
2020-10-21 01:21:30 +0800

20 Oct, 2020

7 commits

31cc578ae netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create ... Browse Code »

This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi
Signed-off-by: Pablo Neira Ayuso

Saeed Mirzamohammadi
2020-10-20 19:54:54 +0800
63137bc58 netfilter: ebtables: Fixes dropping of small packets in bridge nat ... Browse Code »

Fixes an error causing small packets to get dropped. skb_ensure_writable
expects the second parameter to be a length in the ethernet payload.=20
If we want to write the ethernet header (src, dst), we should pass 0.
Otherwise, packets with small payloads (< ETH_ALEN) will get dropped.

Fixes: c1a831167901 ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable")
Signed-off-by: Timothée COCAULT
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Timothée COCAULT
2020-10-20 19:54:53 +0800
68f9f9c2c netfilter: Drop fragmented ndisc packets assembled in netfilter ... Browse Code »

Fragmented ndisc packets assembled in netfilter not dropped as specified
in RFC 6980, section 5. This behaviour breaks TAHI IPv6 Core Conformance
Tests v6LC.2.1.22/23, V6LC.2.2.26/27 and V6LC.2.3.18.

Setting IP6SKB_FRAGMENTED flag during reassembly.

References: commit b800c3b966bc ("ipv6: drop fragmented ndisc packets by default (RFC 6980)")
Signed-off-by: Georg Kohmann
Signed-off-by: Pablo Neira Ayuso

Georg Kohmann
2020-10-20 19:54:53 +0800
4f25434bc netfilter: conntrack: connection timeout after re-register ... Browse Code »

If the first packet conntrack sees after a re-register is an outgoing
keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
SND.NXT-1.
When the peer correctly acknowledges SND.NXT, tcp_in_window fails
check III (Upper bound for valid (s)ack: sack _nfct = 0 and in later conntrack iptables rules not matching.
In cases where iptables are dropping packets that do not match
conntrack rules this can result in idle tcp connections to time out.

v2: adjust td_end when getting the reply rather than when sending out
the keepalive packet.

Fixes: f94e63801ab2 ("netfilter: conntrack: reset tcp maxwin on re-register")
Signed-off-by: Francesco Ruggeri
Signed-off-by: Pablo Neira Ayuso

Francesco Ruggeri
2020-10-20 19:54:53 +0800
79dce09ab ipvs: adjust the debug info in function set_tcp_state ... Browse Code »

Outputting client,virtual,dst addresses info when tcp state changes,
which makes the connection debug more clear

Signed-off-by: longguang.yue
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

longguang.yue
2020-10-20 19:54:46 +0800
df6afe2f7 nexthop: Fix performance regression in nexthop deletion ... Browse Code »

While insertion of 16k nexthops all using the same netdev ('dummy10')
takes less than a second, deletion takes about 130 seconds:

# time -p ip -b nexthop.batch
real 0.29
user 0.01
sys 0.15

# time -p ip link set dev dummy10 down
real 131.03
user 0.06
sys 0.52

This is because of repeated calls to synchronize_rcu() whenever a
nexthop is removed from a nexthop group:

# /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
...
b'finish_task_switch'
b'schedule'
b'schedule_timeout'
b'wait_for_completion'
b'__wait_rcu_gp'
b'synchronize_rcu.part.0'
b'synchronize_rcu'
b'__remove_nexthop'
b'remove_nexthop'
b'nexthop_flush_dev'
b'nh_netdev_event'
b'raw_notifier_call_chain'
b'call_netdevice_notifiers_info'
b'__dev_notify_flags'
b'dev_change_flags'
b'do_setlink'
b'__rtnl_newlink'
b'rtnl_newlink'
b'rtnetlink_rcv_msg'
b'netlink_rcv_skb'
b'rtnetlink_rcv'
b'netlink_unicast'
b'netlink_sendmsg'
b'____sys_sendmsg'
b'___sys_sendmsg'
b'__sys_sendmsg'
b'__x64_sys_sendmsg'
b'do_syscall_64'
b'entry_SYSCALL_64_after_hwframe'
- ip (277)
126554955

Since nexthops are always deleted under RTNL, synchronize_net() can be
used instead. It will call synchronize_rcu_expedited() which only blocks
for several microseconds as opposed to multiple milliseconds like
synchronize_rcu().

With this patch deletion of 16k nexthops takes less than a second:

# time -p ip link set dev dummy10 down
real 0.12
user 0.00
sys 0.04

Tested with fib_nexthops.sh which includes torture tests that prompted
the initial change:

# ./fib_nexthops.sh
...
Tests passed: 134
Tests failed: 0

Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
Signed-off-by: Ido Schimmel
Reviewed-by: Jesse Brandeburg
Reviewed-by: David Ahern
Acked-by: Nikolay Aleksandrov
Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski

Ido Schimmel
2020-10-20 11:07:15 +0800
bc7e343db net: dsa: tag_ksz: KSZ8795 and KSZ9477 also use tail tags ... Browse Code »

The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.

Fixes: 7a6ffe764be3 ("net: dsa: point out the tail taggers")
Signed-off-by: Christian Eggers
Reviewed-by: Florian Fainelli
Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski

Christian Eggers
2020-10-20 08:32:50 +0800

19 Oct, 2020

2 commits

0e8b8d6a2 net: core: use list_del_init() instead of list_del() in netdev_run_todo() ... Browse Code »

dev->unlink_list is reused unless dev is deleted.
So, list_del() should not be used.
Due to using list_del(), dev->unlink_list can't be reused so that
dev->nested_level update logic doesn't work.
In order to fix this bug, list_del_init() should be used instead
of list_del().

Test commands:
ip link add bond0 type bond
ip link add bond1 type bond
ip link set bond0 master bond1
ip link set bond0 nomaster
ip link set bond1 master bond0
ip link set bond1 nomaster

Splat looks like:
[ 255.750458][ T1030] ============================================
[ 255.751967][ T1030] WARNING: possible recursive locking detected
[ 255.753435][ T1030] 5.9.0-rc8+ #772 Not tainted
[ 255.754553][ T1030] --------------------------------------------
[ 255.756047][ T1030] ip/1030 is trying to acquire lock:
[ 255.757304][ T1030] ffff88811782a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync_multiple+0xc2/0x150
[ 255.760056][ T1030]
[ 255.760056][ T1030] but task is already holding lock:
[ 255.761862][ T1030] ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
[ 255.764581][ T1030]
[ 255.764581][ T1030] other info that might help us debug this:
[ 255.766645][ T1030] Possible unsafe locking scenario:
[ 255.766645][ T1030]
[ 255.768566][ T1030] CPU0
[ 255.769415][ T1030] ----
[ 255.770259][ T1030] lock(&dev_addr_list_lock_key/1);
[ 255.771629][ T1030] lock(&dev_addr_list_lock_key/1);
[ 255.772994][ T1030]
[ 255.772994][ T1030] *** DEADLOCK ***
[ 255.772994][ T1030]
[ 255.775091][ T1030] May be due to missing lock nesting notation
[ 255.775091][ T1030]
[ 255.777182][ T1030] 2 locks held by ip/1030:
[ 255.778299][ T1030] #0: ffffffffb1f63250 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x2e4/0x8b0
[ 255.780600][ T1030] #1: ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
[ 255.783411][ T1030]
[ 255.783411][ T1030] stack backtrace:
[ 255.784874][ T1030] CPU: 7 PID: 1030 Comm: ip Not tainted 5.9.0-rc8+ #772
[ 255.786595][ T1030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 255.789030][ T1030] Call Trace:
[ 255.789850][ T1030] dump_stack+0x99/0xd0
[ 255.790882][ T1030] __lock_acquire.cold.71+0x166/0x3cc
[ 255.792285][ T1030] ? register_lock_class+0x1a30/0x1a30
[ 255.793619][ T1030] ? rcu_read_lock_sched_held+0x91/0xc0
[ 255.794963][ T1030] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 255.796246][ T1030] lock_acquire+0x1b8/0x850
[ 255.797332][ T1030] ? dev_mc_sync_multiple+0xc2/0x150
[ 255.798624][ T1030] ? bond_enslave+0x3d4d/0x43e0 [bonding]
[ 255.800039][ T1030] ? check_flags+0x50/0x50
[ 255.801143][ T1030] ? lock_contended+0xd80/0xd80
[ 255.802341][ T1030] _raw_spin_lock_nested+0x2e/0x70
[ 255.803592][ T1030] ? dev_mc_sync_multiple+0xc2/0x150
[ 255.804897][ T1030] dev_mc_sync_multiple+0xc2/0x150
[ 255.806168][ T1030] bond_enslave+0x3d58/0x43e0 [bonding]
[ 255.807542][ T1030] ? __lock_acquire+0xe53/0x51b0
[ 255.808824][ T1030] ? bond_update_slave_arr+0xdc0/0xdc0 [bonding]
[ 255.810451][ T1030] ? check_chain_key+0x236/0x5e0
[ 255.811742][ T1030] ? mutex_is_locked+0x13/0x50
[ 255.812910][ T1030] ? rtnl_is_locked+0x11/0x20
[ 255.814061][ T1030] ? netdev_master_upper_dev_get+0xf/0x120
[ 255.815553][ T1030] do_setlink+0x94c/0x3040
[ ... ]

Reported-by: syzbot+4a0f7bc34e3997a6c7df@syzkaller.appspotmail.com
Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device")
Signed-off-by: Taehee Yoo
Link: https://lore.kernel.org/r/20201015162606.9377-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski

Taehee Yoo
2020-10-19 05:50:25 +0800
f981fc3d5 net: openvswitch: fix to make sure flow_lookup() is not preempted ... Browse Code »

The flow_lookup() function uses per CPU variables, which must be called
with BH disabled. However, this is fine in the general NAPI use case
where the local BH is disabled. But, it's also called from the netlink
context. The below patch makes sure that even in the netlink path, the
BH is disabled.

In addition, u64_stats_update_begin() requires a lock to ensure one writer
which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
ensures that there is always only one writer.

Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
Reported-by: Juri Lelli
Signed-off-by: Eelco Chaudron
Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild
Signed-off-by: Jakub Kicinski

Eelco Chaudron
2020-10-19 03:29:36 +0800

17 Oct, 2020

4 commits

b38e7819c icmp: randomize the global rate limiter ... Browse Code »

Keyu Man reported that the ICMP rate limiter could be used
by attackers to get useful signal. Details will be provided
in an upcoming academic publication.

Our solution is to add some noise, so that the attackers
no longer can get help from the predictable token bucket limiter.

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet
Reported-by: Keyu Man
Signed-off-by: Jakub Kicinski

Eric Dumazet
2020-10-17 07:47:09 +0800
ec78e3185 tipc: fix incorrect setting window for bcast link ... Browse Code »

In commit 16ad3f4022bb
("tipc: introduce variable window congestion control"), we applied
the algorithm to select window size from minimum window to the
configured maximum window for unicast link, and, besides we chose
to keep the window size for broadcast link unchanged and equal (i.e
fix window 50)

However, when setting maximum window variable via command, the window
variable was re-initialized to unexpect value (i.e 32).

We fix this by updating the fix window for broadcast as we stated.

Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control")
Acked-by: Jon Maloy
Signed-off-by: Hoang Huu Le
Signed-off-by: Jakub Kicinski

Hoang Huu Le
2020-10-17 05:09:12 +0800
75cee397a tipc: re-configure queue limit for broadcast link ... Browse Code »

The queue limit of the broadcast link is being calculated base on initial
MTU. However, when MTU value changed (e.g manual changing MTU on NIC
device, MTU negotiation etc.,) we do not re-calculate queue limit.
This gives throughput does not reflect with the change.

So fix it by calling the function to re-calculate queue limit of the
broadcast link.

Acked-by: Jon Maloy
Signed-off-by: Hoang Huu Le
Signed-off-by: Jakub Kicinski

Hoang Huu Le
2020-10-17 05:09:12 +0800
c327a310e svcrdma: fix bounce buffers for unaligned offsets and multiple pages ... Browse Code »

This was discovered using O_DIRECT at the client side, with small
unaligned file offsets or IOs that span multiple file pages.

Fixes: e248aa7be86 ("svcrdma: Remove max_sge check at connect time")
Signed-off-by: Dan Aloni
Signed-off-by: J. Bruce Fields

Dan Aloni
2020-10-17 03:15:04 +0800