Eric Lee / smarc-fsl-linux-kernel

19 Oct, 2022

1 commit

5cb00eceb LF-6580-2: drivers: seco mu: she fast mac ... Browse Code »

Added support for passing MU buffer address.

Signed-off-by: Vabhav Sharma
Reviewed-by: Pankaj Gupta
Reviewed-by: Varun Sethi

Vabhav Sharma
2022-10-19 19:00:05 +0800

18 Oct, 2022

4 commits

a05dfa116 crypto: hbk flags & info added to the tfm ... Browse Code »

Consumer of the kernel crypto api, after allocating
the transformation (tfm), sets the:
- flag 'is_hbk'
- structure 'struct hw_bound_key_info hbk_info'
based on the type of key, the consumer is using.

This helps:

- This helps to influence the core processing logic
for the encapsulated algorithm.
- This flag is set by the consumer after allocating
the tfm and before calling the function crypto_xxx_setkey().

Signed-off-by: Pankaj Gupta
Reviewed-by: Gaurav Jain
Reviewed by: Kshitiz Varshney

Pankaj Gupta
2022-10-18 23:01:37 +0800
9b66ef008 hw-bound-key: introducing the generic structure ... Browse Code »

Hardware bound keys buffer has additional information,
that will be accessed using this new structure.

structure members are:
- flags, flags for hardware specific information.
- key_sz, size of the plain key.

Signed-off-by: Pankaj Gupta
Reviewed-by: Gaurav Jain
Reviewed by: Kshitiz Varshney

Pankaj Gupta
2022-10-18 23:01:37 +0800
56e8bd2d5 LF-7285-3: firmware: ele-trng: Add support for ELE HW Random Number Generator ... Browse Code »

- added ele_trng_init to register Hardware Random Number Generator driver
- added ele_get_random to proceed with random number generation operation.
ele hwrng driver use this to read the random number with Sentinel.

Signed-off-by: Gaurav Jain

Gaurav Jain
2022-10-18 19:05:46 +0800
c5138e49a LF-7285-2: firmware: imx: ele: add base api support for Sentinel TRNG ... Browse Code »

- added ele_get_trng_state to read the TRNG state.
ele hwrng driver use this to check if TRNG entropy is valid
and ready to be read.
- added ele_start_rng to start the initialization of the Sentinel RNG.
ele hwrng driver use this to start Sentinel RNG when trng state
is not valid.

Signed-off-by: Gaurav Jain

Gaurav Jain
2022-10-18 19:05:46 +0800

17 Oct, 2022

5 commits

feb55b773 ACPI/IORT: Add a helper to retrieve RMR info directly ... Browse Code »

This will provide a way for SMMU drivers to retrieve StreamIDs
associated with IORT RMR nodes and use that to set bypass settings
for those IDs.

Tested-by: Steven Price
Tested-by: Laurentiu Tudor
Tested-by: Hanjun Guo
Reviewed-by: Hanjun Guo
Signed-off-by: Shameer Kolothum
Acked-by: Robin Murphy
Link: https://lore.kernel.org/r/20220615101044.1972-6-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel

Shameer Kolothum
2022-10-17 17:27:56 +0800
87e9fbd77 ACPI/IORT: Add support to retrieve IORT RMR reserved regions ... Browse Code »

Parse through the IORT RMR nodes and populate the reserve region list
corresponding to a given IOMMU and device(optional). Also, go through
the ID mappings of the RMR node and retrieve all the SIDs associated
with it.

Reviewed-by: Lorenzo Pieralisi
Tested-by: Steven Price
Tested-by: Laurentiu Tudor
Tested-by: Hanjun Guo
Reviewed-by: Hanjun Guo
Signed-off-by: Shameer Kolothum
Acked-by: Robin Murphy
Link: https://lore.kernel.org/r/20220615101044.1972-5-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel

Shameer Kolothum
2022-10-17 17:27:56 +0800
ee78384c4 ACPI/IORT: Provide a generic helper to retrieve reserve regions ... Browse Code »

Currently IORT provides a helper to retrieve HW MSI reserve regions.
Change this to a generic helper to retrieve any IORT related reserve
regions. This will be useful when we add support for RMR nodes in
subsequent patches.

[Lorenzo: For ACPI IORT]

Reviewed-by: Lorenzo Pieralisi
Reviewed-by: Christoph Hellwig
Tested-by: Steven Price
Tested-by: Laurentiu Tudor
Tested-by: Hanjun Guo
Reviewed-by: Hanjun Guo
Signed-off-by: Shameer Kolothum
Acked-by: Robin Murphy
Link: https://lore.kernel.org/r/20220615101044.1972-4-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel

Shameer Kolothum
2022-10-17 17:27:56 +0800
a4520a0eb ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void ... Browse Code »

At present iort_iommu_msi_get_resv_regions() returns the number of
MSI reserved regions on success and there are no users for this.
The reserved region list will get populated anyway for platforms
that require the HW MSI region reservation. Hence, change the
function to return void instead.

Reviewed-by: Christoph Hellwig
Tested-by: Steven Price
Tested-by: Laurentiu Tudor
Reviewed-by: Hanjun Guo
Signed-off-by: Shameer Kolothum
Acked-by: Robin Murphy
Link: https://lore.kernel.org/r/20220615101044.1972-3-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel

Shameer Kolothum
2022-10-17 17:27:55 +0800
5011a6fbf iommu: Introduce a callback to struct iommu_resv_region ... Browse Code »

A callback is introduced to struct iommu_resv_region to free memory
allocations associated with the reserved region. This will be useful
when we introduce support for IORT RMR based reserved regions.

Reviewed-by: Christoph Hellwig
Tested-by: Steven Price
Tested-by: Laurentiu Tudor
Tested-by: Hanjun Guo
Signed-off-by: Shameer Kolothum
Acked-by: Robin Murphy
Link: https://lore.kernel.org/r/20220615101044.1972-2-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel

Shameer Kolothum
2022-10-17 17:27:55 +0800

06 Oct, 2022

1 commit

4f3997674 LF-7267-1 net: phylink: add phylink_set_mac_pm() helper ... Browse Code »

Add a helper to let the MAC driver has a way to tell the PHY
driver it will manage the PM.

Signed-off-by: Shenwei Wang

Shenwei Wang
2022-10-06 23:51:25 +0800

30 Sep, 2022

1 commit

9e8af38e8 Merge tag 'v5.15.71' into lf-5.15.y ... Browse Code »

This is the 5.15.71 stable release

* tag 'v5.15.71': (144 commits)
Linux 5.15.71
ext4: use locality group preallocation for small closed files
ext4: avoid unnecessary spreading of allocations among groups
...

Signed-off-by: Jason Liu

Conflicts:
drivers/net/phy/aquantia_main.c
drivers/tty/serial/fsl_lpuart.c

Jason Liu
2022-09-30 03:10:22 +0800

28 Sep, 2022

2 commits

4c7e17270 serial: Create uart_xmit_advance() ... Browse Code »

commit e77cab77f2cb3a1ca2ba8df4af45bb35617ac16d upstream.

A very common pattern in the drivers is to advance xmit tail
index and do bookkeeping of Tx'ed characters. Create
uart_xmit_advance() to handle it.

Reviewed-by: Andy Shevchenko
Cc: stable
Signed-off-by: Ilpo Järvinen
Link: https://lore.kernel.org/r/20220901143934.8850-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman

Ilpo Järvinen
2022-09-28 17:11:54 +0800
d10d1e9d9 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES ... Browse Code »

commit d7f06bdd6ee87fbefa05af5f57361d85e7715b11 upstream.

As PAGE_SIZE is unsigned long, -1 > PAGE_SIZE when NR_CPUS
Cc: "Rafael J. Wysocki"
Cc: Yury Norov
Cc: stable@vger.kernel.org
Cc: feng xiangjun
Reported-by: feng xiangjun
Signed-off-by: Phil Auld
Signed-off-by: Yury Norov
Link: https://lore.kernel.org/r/20220906203542.1796629-1-pauld@redhat.com
Signed-off-by: Greg Kroah-Hartman

Phil Auld
2022-09-28 17:11:40 +0800

27 Sep, 2022

1 commit

2b53d4d52 Merge tag 'v5.15.70' into lf-5.15.y ... Browse Code »

This is the 5.15.70 stable release

* tag 'v5.15.70': (2444 commits)
Linux 5.15.70
ALSA: hda/sigmatel: Fix unused variable warning for beep power change
cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
...

Signed-off-by: Jason Liu

Conflicts:
arch/arm/boot/dts/imx6ul.dtsi
arch/arm/mm/mmu.c
arch/arm64/boot/dts/freescale/imx8mp-evk.dts
drivers/gpu/drm/imx/dcss/dcss-kms.c
drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.c
drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.h
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
drivers/soc/fsl/Kconfig
drivers/soc/imx/gpcv2.c
drivers/usb/dwc3/host.c
net/dsa/slave.c
sound/soc/fsl/imx-card.c

Jason Liu
2022-09-27 01:56:22 +0800

23 Sep, 2022

2 commits

39b023528 KVM: SEV: add cache flush to solve SEV cache incoherency issues ... Browse Code »

commit 683412ccf61294d727ead4a73d97397396e69a6b upstream.

Flush the CPU caches when memory is reclaimed from an SEV guest (where
reclaim also includes it being unmapped from KVM's memslots). Due to lack
of coherency for SEV encrypted memory, failure to flush results in silent
data corruption if userspace is malicious/broken and doesn't ensure SEV
guest memory is properly pinned and unpinned.

Cache coherency is not enforced across the VM boundary in SEV (AMD APM
vol.2 Section 15.34.7). Confidential cachelines, generated by confidential
VM guests have to be explicitly flushed on the host side. If a memory page
containing dirty confidential cachelines was released by VM and reallocated
to another user, the cachelines may corrupt the new user at a later time.

KVM takes a shortcut by assuming all confidential memory remain pinned
until the end of VM lifetime. Therefore, KVM does not flush cache at
mmu_notifier invalidation events. Because of this incorrect assumption and
the lack of cache flushing, malicous userspace can crash the host kernel:
creating a malicious VM and continuously allocates/releases unpinned
confidential memory pages when the VM is running.

Add cache flush operations to mmu_notifier operations to ensure that any
physical memory leaving the guest VM get flushed. In particular, hook
mmu_notifier_invalidate_range_start and mmu_notifier_release events and
flush cache accordingly. The hook after releasing the mmu lock to avoid
contention with other vCPUs.

Cc: stable@vger.kernel.org
Suggested-by: Sean Christpherson
Reported-by: Mingwei Zhang
Signed-off-by: Mingwei Zhang
Message-Id:
Signed-off-by: Paolo Bonzini
[OP: adjusted KVM_X86_OP_OPTIONAL() -> KVM_X86_OP_NULL, applied
kvm_arch_guest_memory_reclaimed() call in kvm_set_memslot()]
Signed-off-by: Ovidiu Panait
Signed-off-by: Greg Kroah-Hartman

Mingwei Zhang
2022-09-23 20:15:52 +0800
b04e0208d of/device: Fix up of_dma_configure_id() stub ... Browse Code »

commit 40bfe7a86d84cf08ac6a8fe2f0c8bf7a43edd110 upstream.

Since the stub version of of_dma_configure_id() was added in commit
a081bd4af4ce ("of/device: Add input id to of_dma_configure()"), it has
not matched the signature of the full function, leading to build failure
reports when code using this function is built on !OF configurations.

Fixes: a081bd4af4ce ("of/device: Add input id to of_dma_configure()")
Cc: stable@vger.kernel.org
Signed-off-by: Thierry Reding
Reviewed-by: Frank Rowand
Acked-by: Lorenzo Pieralisi
Link: https://lore.kernel.org/r/20220824153256.1437483-1-thierry.reding@gmail.com
Signed-off-by: Rob Herring
Signed-off-by: Greg Kroah-Hartman

Thierry Reding
2022-09-23 20:15:48 +0800

20 Sep, 2022

4 commits

709edbac4 iommu/vt-d: Fix kdump kernels boot failure with scalable mode ... Browse Code »

[ Upstream commit 0c5f6c0d8201a809a6585b07b6263e9db2c874a3 ]

The translation table copying code for kdump kernels is currently based
on the extended root/context entry formats of ECS mode defined in older
VT-d v2.5, and doesn't handle the scalable mode formats. This causes
the kexec capture kernel boot failure with DMAR faults if the IOMMU was
enabled in scalable mode by the previous kernel.

The ECS mode has already been deprecated by the VT-d spec since v3.0 and
Intel IOMMU driver doesn't support this mode as there's no real hardware
implementation. Hence this converts ECS checking in copying table code
into scalable mode.

The existing copying code consumes a bit in the context entry as a mark
of copied entry. It needs to work for the old format as well as for the
extended context entries. As it's hard to find such a common bit for both
legacy and scalable mode context entries. This replaces it with a per-
IOMMU bitmap.

Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support")
Cc: stable@vger.kernel.org
Reported-by: Jerry Snitselaar
Tested-by: Wen Jin
Signed-off-by: Lu Baolu
Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel
Signed-off-by: Sasha Levin

Lu Baolu
2022-09-20 18:39:43 +0800
64840a4a2 task_stack, x86/cea: Force-inline stack helpers ... Browse Code »

[ Upstream commit e87f4152e542610d0b4c6c8548964a68a59d2040 ]

Force-inline two stack helpers to fix the following objtool warnings:

vmlinux.o: warning: objtool: in_task_stack()+0xc: call to task_stack_page() leaves .noinstr.text section
vmlinux.o: warning: objtool: in_entry_stack()+0x10: call to cpu_entry_stack() leaves .noinstr.text section

Signed-off-by: Borislav Petkov
Acked-by: Peter Zijlstra (Intel)
Link: https://lore.kernel.org/r/20220324183607.31717-2-bp@alien8.de
Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
Signed-off-by: Sasha Levin

Borislav Petkov
2022-09-20 18:39:43 +0800
f9571a969 lockdep: Fix -Wunused-parameter for _THIS_IP_ ... Browse Code »

[ Upstream commit 8b023accc8df70e72f7704d29fead7ca914d6837 ]

While looking into a bug related to the compiler's handling of addresses
of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep.
Drive by cleanup.

-Wunused-parameter:
kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip'
kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip'
kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip'

Signed-off-by: Nick Desaulniers
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com
Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
Signed-off-by: Sasha Levin

Nick Desaulniers
2022-09-20 18:39:42 +0800
a68a734b1 NFS: Fix WARN_ON due to unionization of nfs_inode.nrequests ... Browse Code »

commit 0ebeebcf59601bcfa0284f4bb7abdec051eb856d upstream.

Fixes the following WARN_ON
WARNING: CPU: 2 PID: 18678 at fs/nfs/inode.c:123 nfs_clear_inode+0x3b/0x50 [nfs]
...
Call Trace:
nfs4_evict_inode+0x57/0x70 [nfsv4]
evict+0xd1/0x180
dispose_list+0x48/0x60
evict_inodes+0x156/0x190
generic_shutdown_super+0x37/0x110
nfs_kill_super+0x1d/0x40 [nfs]
deactivate_locked_super+0x36/0xa0

Signed-off-by: Dave Wysochanski
Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Dave Wysochanski
2022-09-20 18:39:42 +0800

15 Sep, 2022

8 commits

8527c9a6b tcp: TX zerocopy should not sense pfmemalloc status ... Browse Code »

[ Upstream commit 3261400639463a853ba2b3be8bd009c2a8089775 ]

We got a recent syzbot report [1] showing a possible misuse
of pfmemalloc page status in TCP zerocopy paths.

Indeed, for pages coming from user space or other layers,
using page_is_pfmemalloc() is moot, and possibly could give
false positives.

There has been attempts to make page_is_pfmemalloc() more robust,
but not using it in the first place in this context is probably better,
removing cpu cycles.

Note to stable teams :

You need to backport 84ce071e38a6 ("net: introduce
__skb_fill_page_desc_noacc") as a prereq.

Race is more probable after commit c07aea3ef4d4
("mm: add a signature in struct page") because page_is_pfmemalloc()
is now using low order bit from page->lru.next, which can change
more often than page->index.

Low order bit should never be set for lru.next (when used as an anchor
in LRU list), so KCSAN report is mostly a false positive.

Backporting to older kernel versions seems not necessary.

[1]
BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag

write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0:
__list_add include/linux/list.h:73 [inline]
list_add include/linux/list.h:88 [inline]
lruvec_add_folio include/linux/mm_inline.h:105 [inline]
lru_add_fn+0x440/0x520 mm/swap.c:228
folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246
folio_batch_add_and_move mm/swap.c:263 [inline]
folio_add_lru+0xf1/0x140 mm/swap.c:490
filemap_add_folio+0xf8/0x150 mm/filemap.c:948
__filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981
pagecache_get_page+0x26/0x190 mm/folio-compat.c:104
grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116
ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988
generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738
ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270
ext4_file_write_iter+0x2e3/0x1210
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x468/0x760 fs/read_write.c:578
ksys_write+0xe8/0x1a0 fs/read_write.c:631
__do_sys_write fs/read_write.c:643 [inline]
__se_sys_write fs/read_write.c:640 [inline]
__x64_sys_write+0x3e/0x50 fs/read_write.c:640
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1:
page_is_pfmemalloc include/linux/mm.h:1740 [inline]
__skb_fill_page_desc include/linux/skbuff.h:2422 [inline]
skb_fill_page_desc include/linux/skbuff.h:2443 [inline]
tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018
do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075
tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline]
tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150
inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
kernel_sendpage+0x184/0x300 net/socket.c:3561
sock_sendpage+0x5a/0x70 net/socket.c:1054
pipe_to_sendpage+0x128/0x160 fs/splice.c:361
splice_from_pipe_feed fs/splice.c:415 [inline]
__splice_from_pipe+0x222/0x4d0 fs/splice.c:559
splice_from_pipe fs/splice.c:594 [inline]
generic_splice_sendpage+0x89/0xc0 fs/splice.c:743
do_splice_from fs/splice.c:764 [inline]
direct_splice_actor+0x80/0xa0 fs/splice.c:931
splice_direct_to_actor+0x305/0x620 fs/splice.c:886
do_splice_direct+0xfb/0x180 fs/splice.c:974
do_sendfile+0x3bf/0x910 fs/read_write.c:1249
__do_sys_sendfile64 fs/read_write.c:1317 [inline]
__se_sys_sendfile64 fs/read_write.c:1303 [inline]
__x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x0000000000000000 -> 0xffffea0004a1d288

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b5d05-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022

Fixes: c07aea3ef4d4 ("mm: add a signature in struct page")
Reported-by: syzbot
Signed-off-by: Eric Dumazet
Cc: Shakeel Butt
Reviewed-by: Shakeel Butt
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Eric Dumazet
2022-09-15 17:30:05 +0800
cd5f1a69d net: introduce __skb_fill_page_desc_noacc ... Browse Code »

[ Upstream commit 84ce071e38a6e25ea3ea91188e5482ac1f17b3af ]

Managed pages contain pinned userspace pages and controlled by upper
layers, there is no need in tracking skb->pfmemalloc for them. Introduce
a helper for filling frags but ignoring page tracking, it'll be needed
later.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jakub Kicinski
Signed-off-by: Sasha Levin

Pavel Begunkov
2022-09-15 17:30:05 +0800
24a4e79d9 rxrpc: Fix ICMP/ICMP6 error handling ... Browse Code »

[ Upstream commit ac56a0b48da86fd1b4389632fb7c4c8a5d86eefa ]

Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing
it to siphon off UDP packets early in the handling of received UDP packets
thereby avoiding the packet going through the UDP receive queue, it doesn't
get ICMP packets through the UDP ->sk_error_report() callback. In fact, it
doesn't appear that there's any usable option for getting hold of ICMP
packets.

Fix this by adding a new UDP encap hook to distribute error messages for
UDP tunnels. If the hook is set, then the tunnel driver will be able to
see ICMP packets. The hook provides the offset into the packet of the UDP
header of the original packet that caused the notification.

An alternative would be to call the ->error_handler() hook - but that
requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error()
do, though isn't really necessary or desirable in rxrpc's case is we want
to parse them there and then, not queue them).

Changes
=======
ver #3)
- Fixed an uninitialised variable.

ver #2)
- Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals.

Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells
Signed-off-by: Sasha Levin

David Howells
2022-09-15 17:30:05 +0800
3b97deb4a NFS: Fix another fsync() issue after a server reboot ... Browse Code »

[ Upstream commit 67f4b5dc49913abcdb5cc736e73674e2f352f81d ]

Currently, when the writeback code detects a server reboot, it redirties
any pages that were not committed to disk, and it sets the flag
NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor
that dirtied the file. While this allows the file descriptor in question
to redrive its own writes, it violates the fsync() requirement that we
should be synchronising all writes to disk.
While the problem is infrequent, we do see corner cases where an
untimely server reboot causes the fsync() call to abandon its attempt to
sync data to disk and causing data corruption issues due to missed error
conditions or similar.

In order to tighted up the client's ability to deal with this situation
without introducing livelocks, add a counter that records the number of
times pages are redirtied due to a server reboot-like condition, and use
that in fsync() to redrive the sync to disk.

Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin

Trond Myklebust
2022-09-15 17:30:03 +0800
31b992b3c NFS: Save some space in the inode ... Browse Code »

[ Upstream commit e591b298d7ecb851e200f65946e3d53fe78a3c4f ]

Save some space in the nfs_inode by setting up an anonymous union with
the fields that are peculiar to a specific type of filesystem object.

Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin

Trond Myklebust
2022-09-15 17:30:03 +0800
88d24e83a NFS: Further optimisations for 'ls -l' ... Browse Code »

[ Upstream commit ff81dfb5d721fff87bd516c558847f6effb70031 ]

If a user is doing 'ls -l', we have a heuristic in GETATTR that tells
the readdir code to try to use READDIRPLUS in order to refresh the inode
attributes. In certain cirumstances, we also try to invalidate the
remaining directory entries in order to ensure this refresh.

If there are multiple readers of the directory, we probably should avoid
invalidating the page cache, since the heuristic breaks down in that
situation anyway.

Signed-off-by: Trond Myklebust
Tested-by: Benjamin Coddington
Reviewed-by: Benjamin Coddington
Signed-off-by: Sasha Levin

Trond Myklebust
2022-09-15 17:30:03 +0800
94c84128a debugfs: add debugfs_lookup_and_remove() ... Browse Code »

commit dec9b2f1e0455a151a7293c367da22ab973f713e upstream.

There is a very common pattern of using
debugfs_remove(debufs_lookup(..)) which results in a dentry leak of the
dentry that was looked up. Instead of having to open-code the correct
pattern of calling dput() on the dentry, create
debugfs_lookup_and_remove() to handle this pattern automatically and
properly without any memory leaks.

Cc: stable
Reported-by: Kuyo Chang
Tested-by: Kuyo Chang
Link: https://lore.kernel.org/r/YxIaQ8cSinDR881k@kroah.com
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2022-09-15 17:30:02 +0800
7c6333411 fs: only do a memory barrier for the first set_buffer_uptodate() ... Browse Code »

commit 2f79cdfe58c13949bbbb65ba5926abfe9561d0ec upstream.

Commit d4252071b97d ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.

However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.

That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.

Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.

Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:

fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression

although you presumably need a fast disk and a lot of cores to actually
notice.

Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/
Reported-by: kernel test robot
Tested-by: Fengwei Yin
Cc: Mikulas Patocka
Cc: Matthew Wilcox (Oracle)
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2022-09-15 17:30:00 +0800

08 Sep, 2022

3 commits

c548b99e1 USB: core: Prevent nested device-reset calls ... Browse Code »

commit 9c6d778800b921bde3bff3cff5003d1650f942d1 upstream.

Automatic kernel fuzzing revealed a recursive locking violation in
usb-storage:

============================================
WARNING: possible recursive locking detected
5.18.0 #3 Not tainted
--------------------------------------------
kworker/1:3/1205 is trying to acquire lock:
ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at:
usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230

but task is already holding lock:
ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at:
usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230

...

stack backtrace:
CPU: 1 PID: 1205 Comm: kworker/1:3 Not tainted 5.18.0 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: usb_hub_wq hub_event
Call Trace:

__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2988 [inline]
check_deadlock kernel/locking/lockdep.c:3031 [inline]
validate_chain kernel/locking/lockdep.c:3816 [inline]
__lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5630
__mutex_lock_common kernel/locking/mutex.c:603 [inline]
__mutex_lock+0x14f/0x1610 kernel/locking/mutex.c:747
usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230
usb_reset_device+0x37d/0x9a0 drivers/usb/core/hub.c:6109
r871xu_dev_remove+0x21a/0x270 drivers/staging/rtl8712/usb_intf.c:622
usb_unbind_interface+0x1bd/0x890 drivers/usb/core/driver.c:458
device_remove drivers/base/dd.c:545 [inline]
device_remove+0x11f/0x170 drivers/base/dd.c:537
__device_release_driver drivers/base/dd.c:1222 [inline]
device_release_driver_internal+0x1a7/0x2f0 drivers/base/dd.c:1248
usb_driver_release_interface+0x102/0x180 drivers/usb/core/driver.c:627
usb_forced_unbind_intf+0x4d/0xa0 drivers/usb/core/driver.c:1118
usb_reset_device+0x39b/0x9a0 drivers/usb/core/hub.c:6114

This turned out not to be an error in usb-storage but rather a nested
device reset attempt. That is, as the rtl8712 driver was being
unbound from a composite device in preparation for an unrelated USB
reset (that driver does not have pre_reset or post_reset callbacks),
its ->remove routine called usb_reset_device() -- thus nesting one
reset call within another.

Performing a reset as part of disconnect processing is a questionable
practice at best. However, the bug report points out that the USB
core does not have any protection against nested resets. Adding a
reset_in_progress flag and testing it will prevent such errors in the
future.

Link: https://lore.kernel.org/all/CAB7eexKUpvX-JNiLzhXBDWgfg2T9e9_0Tw4HQ6keN==voRbP0g@mail.gmail.com/
Cc: stable@vger.kernel.org
Reported-and-tested-by: Rondreis
Signed-off-by: Alan Stern
Link: https://lore.kernel.org/r/YwkflDxvg0KWqyZK@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman

Alan Stern
2022-09-08 18:28:07 +0800
b201f6203 usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles ... Browse Code »

commit c1e5c2f0cb8a22ec2e14af92afc7006491bebabb upstream.

Fix incorrect pin assignment values when connecting to a monitor with
Type-C receptacle instead of a plug.

According to specification, an UFP_D receptacle's pin assignment
should came from the UFP_D pin assignments field (bit 23:16), while
an UFP_D plug's assignments are described in the DFP_D pin assignments
(bit 15:8) during Mode Discovery.

For example the LG 27 UL850-W is a monitor with Type-C receptacle.
The monitor responds to MODE DISCOVERY command with following
DisplayPort Capability flag:

dp->alt->vdo=0x140045

The existing logic only take cares of UPF_D plug case,
and would take the bit 15:8 for this 0x140045 case.

This results in an non-existing pin assignment 0x0 in
dp_altmode_configure.

To fix this problem a new set of macros are introduced
to take plug/receptacle differences into consideration.

Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Co-developed-by: Pablo Sun
Co-developed-by: Macpaul Lin
Reviewed-by: Guillaume Ranquet
Reviewed-by: Heikki Krogerus
Signed-off-by: Pablo Sun
Signed-off-by: Macpaul Lin
Link: https://lore.kernel.org/r/20220804034803.19486-1-macpaul.lin@mediatek.com
Signed-off-by: Greg Kroah-Hartman

Pablo Sun
2022-09-08 18:28:06 +0800
9e1f74294 platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask ... Browse Code »

[ Upstream commit 0a90ed8d0cfa29735a221eba14d9cb6c735d35b6 ]

On Intel hardware the SLP_TYPx bitfield occupies bits 10-12 as per ACPI
specification (see Table 4.13 "PM1 Control Registers Fixed Hardware
Feature Control Bits" for the details).

Fix the mask and other related definitions accordingly.

Fixes: 93e5eadd1f6e ("x86/platform: New Intel Atom SOC power management controller driver")
Signed-off-by: Andy Shevchenko
Link: https://lore.kernel.org/r/20220801113734.36131-1-andriy.shevchenko@linux.intel.com
Reviewed-by: Hans de Goede
Signed-off-by: Hans de Goede
Signed-off-by: Sasha Levin

Andy Shevchenko
2022-09-08 18:28:01 +0800

06 Sep, 2022

1 commit

36363d862 fsl_qbman: ceetm: errata A-010383 workaround for LNI shapers ... Browse Code »

When a frame is dequeued by the CEETM, the effective packet size is a
function of the actual packet size, the configured Overhead Accounting
Length (OAL), and the configured Minimum Packet Size (MPS). This
effective packet size is used to update the CEETM shaper credits. This
MPS value should be applied to the actual packet size before the OAL
value is applied. However, the OAL value is applied to the actual packet
size first. Then, the MPS is applied. This results in an incorrect value
being subtracted from the CEETM credit for that particular Logical
Network Interface (LNI) shaper.

To avoid an erroneous shaper credit count, do not configure both OAL and
MPS on an LNI shaper.

Signed-off-by: Camelia Groza

Camelia Groza
2022-09-06 17:50:50 +0800

05 Sep, 2022

3 commits

c18a209b5 mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse ... Browse Code »

commit 2555283eb40df89945557273121e9393ef9b542b upstream.

anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.

anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.

This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created. So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.

This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same. When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.

Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.

Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Signed-off-by: Jann Horn
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Jann Horn
2022-09-05 16:30:07 +0800
a75987714 bpf: Don't redirect packets with invalid pkt_len ... Browse Code »

commit fd1894224407c484f652ad456e1ce423e89bb3eb upstream.

Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.

LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html

Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao
Reviewed-by: Stanislav Fomichev
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com
Signed-off-by: Alexei Starovoitov
Signed-off-by: Greg Kroah-Hartman

Zhengchao Shao
2022-09-05 16:30:07 +0800
a5d1cb908 net: fix refcount bug in sk_psock_get (2) ... Browse Code »

commit 2a0133723f9ebeb751cfce19f74ec07e108bef1f upstream.

Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0

__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

For there will possibly be more flags in the sk_user_data field,
this patch also refactor sk_user_data flags code to be more generic
to improve its maintainability.

Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
Suggested-by: Jakub Kicinski
Acked-by: Wen Gu
Signed-off-by: Hawkins Jiawei
Reviewed-by: Jakub Sitnicki
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman

Hawkins Jiawei
2022-09-05 16:30:07 +0800

01 Sep, 2022

1 commit

1453ea12a fsl_qbman: Pass interface name as a string instead of pointer ... Browse Code »

Pass interface name to kernel through IOCTL as string instead
of character pointer.

Signed-off-by: Rohit Raj
LF-6573
LF-5837

Rohit Raj
2022-09-01 00:06:31 +0800

31 Aug, 2022

3 commits

f1aedd2ff Revert "memcg: cleanup racy sum avoidance code" ... Browse Code »

commit dbb16df6443c59e8a1ef21c2272fcf387d600ddf upstream.

This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f.

Recently we started running the kernel with rstat infrastructure on
production traffic and begin to see negative memcg stats values.
Particularly the 'sock' stat is the one which we observed having negative
value.

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 18446744073708724224

Re-run after couple of seconds

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 53248

For now we are only seeing this issue on large machines (256 CPUs) and
only with 'sock' stat. I think the networking stack increase the stat on
one cpu and decrease it on another cpu much more often. So, this negative
sock is due to rstat flusher flushing the stats on the CPU that has seen
the decrement of sock but missed the CPU that has increments. A typical
race condition.

For easy stable backport, revert is the most simple solution. For long
term solution, I am thinking of two directions. First is just reduce the
race window by optimizing the rstat flusher. Second is if the reader sees
a negative stat value, force flush and restart the stat collection.
Basically retry but limited.

Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code")
Signed-off-by: Shakeel Butt
Cc: "Michal Koutný"
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Muchun Song
Cc: David Hildenbrand
Cc: Yosry Ahmed
Cc: Greg Thelen
Cc: [5.15]
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Shakeel Butt
2022-08-31 23:16:48 +0800
b340f83da net: Fix data-races around sysctl_devconf_inherit_init_net. ... Browse Code »

[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Kuniyuki Iwashima
2022-08-31 23:16:44 +0800
181bae6df net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. ... Browse Code »

[ Upstream commit af67508ea6cbf0e4ea27f8120056fa2efce127dd ]

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Kuniyuki Iwashima
2022-08-31 23:16:44 +0800