Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2021

16 commits

2cded5a3c perf: Break deadlock involving exec_update_mutex ... Browse Code »

[ Upstream commit 78af4dc949daaa37b3fcd5f348f373085b4e858f ]

Syzbot reported a lock inversion involving perf. The sore point being
perf holding exec_update_mutex() for a very long time, specifically
across a whole bunch of filesystem ops in pmu::event_init() (uprobes)
and anon_inode_getfile().

This then inverts against procfs code trying to take
exec_update_mutex.

Move the permission checks later, such that we need to hold the mutex
over less code.

Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Sasha Levin

peterz@infradead.org
2021-01-09 20:46:24 +0800
36cf9ae54 fuse: fix bad inode ... Browse Code »

[ Upstream commit 5d069dbe8aaf2a197142558b6fb2978189ba3454 ]

Jan Kara's analysis of the syzbot report (edited):

The reproducer opens a directory on FUSE filesystem, it then attaches
dnotify mark to the open directory. After that a fuse_do_getattr() call
finds that attributes returned by the server are inconsistent, and calls
make_bad_inode() which, among other things does:

inode->i_mode = S_IFREG;

This then confuses dnotify which doesn't tear down its structures
properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode. Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi
Tested-by: Jan Kara
Cc:
Signed-off-by: Sasha Levin

Miklos Szeredi
2021-01-09 20:46:24 +0800
e522a788e RDMA/siw,rxe: Make emulated devices virtual in the device tree ... Browse Code »

[ Upstream commit a9d2e9ae953f0ddd0327479c81a085adaa76d903 ]

This moves siw and rxe to be virtual devices in the device tree:

lrwxrwxrwx 1 root root 0 Nov 6 13:55 /sys/class/infiniband/rxe0 -> ../../devices/virtual/infiniband/rxe0/

Previously they were trying to parent themselves to the physical device of
their attached netdev, which doesn't make alot of sense.

My hope is this will solve some weird syzkaller hits related to sysfs as
it could be possible that the parent of a netdev is another netdev, eg
under bonding or some other syzkaller found netdev configuration.

Nesting a ib_device under anything but a physical device is going to cause
inconsistencies in sysfs during destructions.

Link: https://lore.kernel.org/r/0-v1-dcbfc68c4b4a+d6-virtual_dev_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Jason Gunthorpe
2021-01-09 20:46:24 +0800
404fa0937 RDMA/core: remove use of dma_virt_ops ... Browse Code »

[ Upstream commit 5a7a9e038b032137ae9c45d5429f18a2ffdf7d42 ]

Use the ib_dma_* helpers to skip the DMA translation instead. This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it. This also means the software RDMA drivers now don't
have to mess with DMA parameters that are not relevant to them at all, and
that in the future we can use PCI P2P transfers even for software RDMA, as
there is no first fake layer of DMA mapping that the P2P DMA support.

Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de
Signed-off-by: Christoph Hellwig
Tested-by: Mike Marciniszyn
Signed-off-by: Jason Gunthorpe
Signed-off-by: Sasha Levin

Christoph Hellwig
2021-01-09 20:46:24 +0800
2a54ad306 scsi: ufs: Re-enable WriteBooster after device reset ... Browse Code »

[ Upstream commit bd14bf0e4a084514aa62d24d2109e0f09a93822f ]

UFS 3.1 specification mentions that the WriteBooster flags listed below
will be set to their default values, i.e. disabled, after power cycle or
any type of reset event. Thus we need to reset the flag variables kept in
struct hba to align with the device status and ensure that
WriteBooster-related functions are configured properly after device reset.

Without this fix, WriteBooster will not be enabled successfully after by
ufshcd_wb_ctrl() after device reset because hba->wb_enabled remains true.

Flags required to be reset to default values:

- fWriteBoosterEn: hba->wb_enabled

- fWriteBoosterBufferFlushEn: hba->wb_buf_flush_enabled

- fWriteBoosterBufferFlushDuringHibernate: No variable mapped

Link: https://lore.kernel.org/r/20201208135635.15326-2-stanley.chu@mediatek.com
Fixes: 3d17b9b5ab11 ("scsi: ufs: Add write booster feature support")
Reviewed-by: Bean Huo
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin

Stanley Chu
2021-01-09 20:46:23 +0800
acbf7db67 scsi: ufs: Allow an error return value from ->device_reset() ... Browse Code »

[ Upstream commit 151f1b664ffbb847c7fbbce5a5b8580f1b9b1d98 ]

It is simpler for drivers to provide a ->device_reset() callback
irrespective of whether the GPIO, or firmware interface necessary to do the
reset, is discovered during probe.

Change ->device_reset() to return an error code. Drivers that provide the
callback, but do not do the reset operation should return -EOPNOTSUPP.

Link: https://lore.kernel.org/r/20201103141403.2142-3-adrian.hunter@intel.com
Reviewed-by: Asutosh Das
Reviewed-by: Stanley Chu
Reviewed-by: Bean huo
Reviewed-by: Can Guo
Signed-off-by: Adrian Hunter
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin

Adrian Hunter
2021-01-09 20:46:23 +0800
8cba90399 drm/i915/tgl: Fix Combo PHY DPLL fractional divider for 38.4MHz ref clock ... Browse Code »

commit 0e2497e334de42dbaaee8e325241b5b5b34ede7e upstream.

Apply Display WA #22010492432 for combo PHY PLLs too. This should fix a
problem where the PLL output frequency is slightly off with the current
PLL fractional divider value.

I haven't seen an actual case where this causes a problem, but let's
follow the spec. It's also needed on some EHL platforms, but for that we
also need a way to distinguish the affected EHL SKUs, so I leave that
for a follow-up.

v2:
- Apply the WA at one place when calculating the PLL dividers from the
frequency and the frequency from the dividers for all the combo PLL
use cases (DP, HDMI, TBT). (Ville)

Cc: Ville Syrjälä
Reviewed-by: Ville Syrjälä
Signed-off-by: Imre Deak
Link: https://patchwork.freedesktop.org/patch/msgid/20201003001846.1271151-6-imre.deak@intel.com
Signed-off-by: Greg Kroah-Hartman

Imre Deak
2021-01-09 20:46:23 +0800
adee1c512 ALSA: hda/hdmi: Fix incorrect mutex unlock in silent_stream_disable() ... Browse Code »

commit 3d5c5fdcee0f9a94deb0472e594706018b00aa31 upstream.

The silent_stream_disable() function introduced by the commit
b1a5039759cb ("ALSA: hda/hdmi: fix silent stream for first playback to
DP") takes the per_pin->lock mutex, but it unlocks the wrong one,
spec->pcm_lock, which causes a deadlock. This patch corrects it.

Fixes: b1a5039759cb ("ALSA: hda/hdmi: fix silent stream for first playback to DP")
Reported-by: Jan Alexander Steffens (heftig)
Cc:
Acked-by: Kai Vehmanen
Link: https://lore.kernel.org/r/20210101083852.12094-1-tiwai@suse.de
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman

Takashi Iwai
2021-01-09 20:46:23 +0800
e235fd076 ALSA: hda/realtek - Modify Dell platform name ... Browse Code »

commit c1e8952395c1f44a6304c71401519d19ed2ac56a upstream.

Dell platform SSID:0x0a58 change platform name.
Use the generic name instead for avoiding confusion.

Fixes: 150927c3674d ("ALSA: hda/realtek - Supported Dell fixed type headset")
Signed-off-by: Kailang Yang
Cc:
Link: https://lore.kernel.org/r/efe7c196158241aa817229df7835d645@realtek.com
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman

Kailang Yang
2021-01-09 20:46:23 +0800
ce9163cf7 Bluetooth: Fix attempting to set RPA timeout when unsupported ... Browse Code »

commit a31489d2a368d2f9225ed6a6f595c63bc7d10de8 upstream.

During controller initialization, an LE Set RPA Timeout command is sent
to the controller if supported. However, the value checked to determine
if the command is supported is incorrect. Page 1921 of the Bluetooth
Core Spec v5.2 shows that bit 2 of octet 35 of the Supported_Commands
field corresponds to the LE Set RPA Timeout command, but currently
bit 6 of octet 35 is checked. This patch checks the correct value
instead.

This issue led to the error seen in the following btmon output during
initialization of an adapter (rtl8761b) and prevented initialization
from completing.

< HCI Command: LE Set Resolvable Private Address Timeout (0x08|0x002e) plen 2
Timeout: 900 seconds
> HCI Event: Command Complete (0x0e) plen 4
LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 2
Status: Unsupported Remote Feature / Unsupported LMP Feature (0x1a)
= Close Index: 00:E0:4C:6B:E5:03

The error did not appear when running with this patch.

Signed-off-by: Edward Vear
Signed-off-by: Marcel Holtmann
Signed-off-by: Johan Hedberg
Cc: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Edward Vear
2021-01-09 20:46:23 +0800
3e0735089 kdev_t: always inline major/minor helper functions ... Browse Code »

commit aa8c7db494d0a83ecae583aa193f1134ef25d506 upstream.

Silly GCC doesn't always inline these trivial functions.

Fixes the following warning:

arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to new_encode_dev() with UACCESS enabled

Link: https://lkml.kernel.org/r/984353b44a4484d86ba9f73884b7306232e25e30.1608737428.git.jpoimboe@redhat.com
Signed-off-by: Josh Poimboeuf
Reported-by: Randy Dunlap
Acked-by: Randy Dunlap [build-tested]
Cc: Peter Zijlstra
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Josh Poimboeuf
2021-01-09 20:46:23 +0800
fd3ec3b25 dt-bindings: rtc: add reset-source property ... Browse Code »

commit 320d159e2d63a97a40f24cd6dfda5a57eec65b91 upstream.

Some RTCs, e.g. the pcf2127, can be used as a hardware watchdog. But
if the reset pin is not actually wired up, the driver exposes a
watchdog device that doesn't actually work.

Provide a standard binding that can be used to indicate that a given
RTC can perform a reset of the machine, similar to wakeup-source.

Suggested-by: Alexandre Belloni
Signed-off-by: Rasmus Villemoes
Reviewed-by: Rob Herring
Signed-off-by: Alexandre Belloni
Link: https://lore.kernel.org/r/20201218101054.25416-2-rasmus.villemoes@prevas.dk
Signed-off-by: Greg Kroah-Hartman

Rasmus Villemoes
2021-01-09 20:46:22 +0800
757cd94ac rtc: pcf2127: only use watchdog when explicitly available ... Browse Code »

commit 71ac13457d9d1007effde65b54818106b2c2b525 upstream.

Most boards using the pcf2127 chip (in my bubble) don't make use of the
watchdog functionality and the respective output is not connected. The
effect on such a board is that there is a watchdog device provided that
doesn't work.

So only register the watchdog if the device tree has a "reset-source"
property.

Signed-off-by: Uwe Kleine-König
[RV: s/has-watchdog/reset-source/]
Signed-off-by: Rasmus Villemoes
Signed-off-by: Alexandre Belloni
Link: https://lore.kernel.org/r/20201218101054.25416-3-rasmus.villemoes@prevas.dk
Signed-off-by: Greg Kroah-Hartman

Uwe Kleine-König
2021-01-09 20:46:22 +0800
acb821425 rtc: pcf2127: move watchdog initialisation to a separate function ... Browse Code »

commit 5d78533a0c53af9659227c803df944ba27cd56e0 upstream.

The obvious advantages are:

- The linker can drop the watchdog functions if CONFIG_WATCHDOG is off.
- All watchdog stuff grouped together with only a single function call
left in generic code.
- Watchdog register is only read when it is actually used.
- Less #ifdefery

Signed-off-by: Uwe Kleine-König
Signed-off-by: Alexandre Belloni
Link: https://lore.kernel.org/r/20200924105256.18162-2-u.kleine-koenig@pengutronix.de
Cc: Rasmus Villemoes
Signed-off-by: Greg Kroah-Hartman

Uwe Kleine-König
2021-01-09 20:46:22 +0800
b00195241 Revert "mtd: spinand: Fix OOB read" ... Browse Code »

This reverts stable commit baad618d078c857f99cc286ea249e9629159901f.

This commit is adding lines to spinand_write_to_cache_op, wheras the upstream
commit 868cbe2a6dcee451bd8f87cbbb2a73cf463b57e5 that this was supposed to
backport was touching spinand_read_from_cache_op.
It causes a crash on writing OOB data by attempting to write to read-only
kernel memory.

Cc: Miquel Raynal
Signed-off-by: Felix Fietkau
Signed-off-by: Greg Kroah-Hartman

Felix Fietkau
2021-01-09 20:46:22 +0800
261f4d03a Revert "drm/amd/display: Fix memory leaks in S3 resume" ... Browse Code »

This reverts commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362.

This leads to blank screens on some boards after replugging a
display. Revert until we understand the root cause and can
fix both the leak and the blank screen after replug.

Cc: Stylon Wang
Cc: Harry Wentland
Cc: Nicholas Kazlauskas
Cc: Andre Tomt
Cc: Oleksandr Natalenko
Signed-off-by: Alex Deucher
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Alex Deucher
2021-01-09 20:46:22 +0800

06 Jan, 2021

24 commits

f5247949c Linux 5.10.5 ... Browse Code »

Tested-by: Jon Hunter
Tested-by: Linux Kernel Functional Testing
Tested-by: Jeffrin Jose T
Tested-by: Shuah Khan
Tested-by: Guenter Roeck
Link: https://lore.kernel.org/r/20210104155708.800470590@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2021-01-06 21:56:56 +0800
12d377b93 device-dax: Fix range release ... Browse Code »

[ Upstream commit 6268d7da4d192af339f4d688942b9ccb45a65e04 ]

There are multiple locations that open-code the release of the last
range in a device-dax instance. Consolidate this into a new
dev_dax_trim_range() helper.

This also addresses a kmemleak report:

# cat /sys/kernel/debug/kmemleak
[..]
unreferenced object 0xffff976bd46f6240 (size 64):
comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00 .......... .7...
ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00 ....8...........
backtrace:
[] __kmalloc_track_caller+0x136/0x379
[] krealloc+0x67/0x92
[] __alloc_dev_dax_range+0x73/0x25c
[] devm_create_dev_dax+0x27d/0x416
[] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core]
[] dax_pmem_probe+0x10/0x1f [dax_pmem]
[] nvdimm_bus_probe+0x9d/0x340 [libnvdimm]
[] really_probe+0x230/0x48d
[] driver_probe_device+0x122/0x13b
[] device_driver_attach+0x5b/0x60
[] bind_store+0xb7/0xc3
[] drv_attr_store+0x27/0x31
[] sysfs_kf_write+0x4a/0x57
[] kernfs_fop_write+0x150/0x1e5
[] __vfs_write+0x1b/0x34
[] vfs_write+0xd8/0x1d1

Reported-by: Jane Chu
Cc: Zhen Lei
Link: https://lore.kernel.org/r/160834570161.1791850.14911670304441510419.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Signed-off-by: Sasha Levin

Dan Williams
2021-01-06 21:56:56 +0800
aceb8ae8e ext4: avoid s_mb_prefetch to be zero in individual scenarios ... Browse Code »

[ Upstream commit 82ef1370b0c1757ab4ce29f34c52b4e93839b0aa ]

Commit cfd732377221 ("ext4: add prefetching for block allocation
bitmaps") introduced block bitmap prefetch, and expects to read block
bitmaps of flex_bg through an IO. However, it seems to ignore the
value range of s_log_groups_per_flex. In the scenario where the value
of s_log_groups_per_flex is greater than 27, s_mb_prefetch or
s_mb_prefetch_limit will overflow, cause a divide zero exception.

In addition, the logic of calculating nr is also flawed, because the
size of flexbg is fixed during a single mount, but s_mb_prefetch can
be modified, which causes nr to fail to meet the value condition of
[1, flexbg_size].

To solve this problem, we need to set the upper limit of
s_mb_prefetch. Since we expect to load block bitmaps of a flex_bg
through an IO, we can consider determining a reasonable upper limit
among the IO limit parameters. After consideration, we chose
BLK_MAX_SEGMENT_SIZE. This is a good choice to solve divide zero
problem and avoiding performance degradation.

[ Some minor code simplifications to make the changes easy to follow -- TYT ]

Reported-by: Tosk Robot
Signed-off-by: Chunguang Xu
Reviewed-by: Samuel Liao
Reviewed-by: Andreas Dilger
Link: https://lore.kernel.org/r/1607051143-24508-1-git-send-email-brookxu@tencent.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

Chunguang Xu
2021-01-06 21:56:56 +0800
aff18aa80 dm verity: skip verity work if I/O error when system is shutting down ... Browse Code »

[ Upstream commit 252bd1256396cebc6fc3526127fdb0b317601318 ]

If emergency system shutdown is called, like by thermal shutdown,
a dm device could be alive when the block device couldn't process
I/O requests anymore. In this state, the handling of I/O errors
by new dm I/O requests or by those already in-flight can lead to
a verity corruption state, which is a misjudgment.

So, skip verity work in response to I/O error when system is shutting
down.

Signed-off-by: Hyeongseok Kim
Reviewed-by: Sami Tolvanen
Signed-off-by: Mike Snitzer
Signed-off-by: Sasha Levin

Hyeongseok Kim
2021-01-06 21:56:56 +0800
610d2fa0e ALSA: pcm: Clear the full allocated memory at hw_params ... Browse Code »

[ Upstream commit 618de0f4ef11acd8cf26902e65493d46cc20cc89 ]

The PCM hw_params core function tries to clear up the PCM buffer
before actually using for avoiding the information leak from the
previous usages or the usage before a new allocation. It performs the
memset() with runtime->dma_bytes, but this might still leave some
remaining bytes untouched; namely, the PCM buffer size is aligned in
page size for mmap, hence runtime->dma_bytes doesn't necessarily cover
all PCM buffer pages, and the remaining bytes are exposed via mmap.

This patch changes the memory clearance to cover the all buffer pages
if the stream is supposed to be mmap-ready (that guarantees that the
buffer size is aligned in page size).

Reviewed-by: Lars-Peter Clausen
Link: https://lore.kernel.org/r/20201218145625.2045-3-tiwai@suse.de
Signed-off-by: Takashi Iwai
Signed-off-by: Sasha Levin

Takashi Iwai
2021-01-06 21:56:56 +0800
c7b04d27c io_uring: remove racy overflow list fast checks ... Browse Code »

[ Upstream commit 9cd2be519d05ee78876d55e8e902b7125f78b74f ]

list_empty_careful() is not racy only if some conditions are met, i.e.
no re-adds after del_init. io_cqring_overflow_flush() does list_move(),
so it's actually racy.

Remove those checks, we have ->cq_check_overflow for the fast path.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-06 21:56:55 +0800
13f9eec22 s390: always clear kernel stack backchain before calling functions ... Browse Code »

[ Upstream commit 9365965db0c7ca7fc81eee27c21d8522d7102c32 ]

Clear the kernel stack backchain before potentially calling the
lockdep trace_hardirqs_off/on functions. Without this walking the
kernel backchain, e.g. during a panic, might stop too early.

Signed-off-by: Heiko Carstens
Signed-off-by: Sasha Levin

Heiko Carstens
2021-01-06 21:56:55 +0800
330c1ee7d tick/sched: Remove bogus boot "safety" check ... Browse Code »

[ Upstream commit ba8ea8e7dd6e1662e34e730eadfc52aa6816f9dd ]

can_stop_idle_tick() checks whether the do_timer() duty has been taken over
by a CPU on boot. That's silly because the boot CPU always takes over with
the initial clockevent device.

But even if no CPU would have installed a clockevent and taken over the
duty then the question whether the tick on the current CPU can be stopped
or not is moot. In that case the current CPU would have no clockevent
either, so there would be nothing to keep ticking.

Remove it.

Signed-off-by: Thomas Gleixner
Acked-by: Frederic Weisbecker
Link: https://lore.kernel.org/r/20201206212002.725238293@linutronix.de
Signed-off-by: Sasha Levin

Thomas Gleixner
2021-01-06 21:56:55 +0800
9b22bc0f1 drm/amd/display: updated wm table for Renoir ... Browse Code »

[ Upstream commit 410066d24cfc1071be25e402510367aca9db5cb6 ]

[Why]
For certain timings, Renoir may underflow due to sr exit
latency being too slow.

[How]
Updated wm table for renoir.

Signed-off-by: Jake Wang
Reviewed-by: Yongqiang Sun
Acked-by: Qingqing Zhuo
Signed-off-by: Alex Deucher
Signed-off-by: Sasha Levin

Jake Wang
2021-01-06 21:56:55 +0800
86be0f2a0 ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails ... Browse Code »

[ Upstream commit 68cbb8056a4c24c6a38ad2b79e0a9764b235e8fa ]

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Jeff Layton
2021-01-06 21:56:55 +0800
8bcfa178f NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow ... Browse Code »

[ Upstream commit 503b934a752f7e789a5f33217520e0a79f3096ac ]

Expanding the READ_PLUS extents can cause the read buffer to overflow.
If it does, then don't error, but just exit early.

Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin

Trond Myklebust
2021-01-06 21:56:55 +0800
ef3b9ad96 um: ubd: Submit all data segments atomically ... Browse Code »

[ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ]

Internally, UBD treats each physical IO segment as a separate command to
be submitted in the execution pipe. If the pipe returns a transient
error after a few segments have already been written, UBD will tell the
block layer to requeue the request, but there is no way to reclaim the
segments already submitted. When a new attempt to dispatch the request
is done, those segments already submitted will get duplicated, causing
the WARN_ON below in the best case, and potentially data corruption.

In my system, running a UML instance with 2GB of RAM and a 50M UBD disk,
I can reproduce the WARN_ON by simply running mkfs.fvat against the
disk on a freshly booted system.

There are a few ways to around this, like reducing the pressure on
the pipe by reducing the queue depth, which almost eliminates the
occurrence of the problem, increasing the pipe buffer size on the host
system, or by limiting the request to one physical segment, which causes
the block layer to submit way more requests to resolve a single
operation.

Instead, this patch modifies the format of a UBD command, such that all
segments are sent through a single element in the communication pipe,
turning the command submission atomic from the point of view of the
block layer. The new format has a variable size, depending on the
number of elements, and looks like this:

+------------+-----------+-----------+------------
| cmd_header | segment 0 | segment 1 | segment ...
+------------+-----------+-----------+------------

With this format, we push a pointer to cmd_header in the submission
pipe.

This has the advantage of reducing the memory footprint of executing a
single request, since it allow us to merge some fields in the header.
It is possible to reduce even further each segment memory footprint, by
merging bitmap_words and cow_offset, for instance, but this is not the
focus of this patch and is left as future work. One issue with the
patch is that for a big number of segments, we now perform one big
memory allocation instead of multiple small ones, but I wasn't able to
trigger any real issues or -ENOMEM because of this change, that wouldn't
be reproduced otherwise.

This was tested using fio with the verify-crc32 option, and by running
an ext4 filesystem over this UBD device.

The original WARN_ON was:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346
Stack:
6084eed0 6063dc77 00000009 6084ef60
00000000 604b8d9f 6084eee0 6063dcbc
6084ef40 6006ab8d e013d780 1c00000000
Call Trace:
[] ? printk+0x0/0x94
[] show_stack+0x13b/0x155
[] ? dump_stack_print_info+0xdf/0xe8
[] ? refcount_warn_saturate+0x13f/0x141
[] dump_stack+0x2a/0x2c
[] __warn+0x107/0x134
[] ? wake_up_process+0x17/0x19
[] ? blk_queue_max_discard_sectors+0x0/0xd
[] warn_slowpath_fmt+0xd1/0xdf
[] ? warn_slowpath_fmt+0x0/0xdf
[] ? raw_read_seqcount_begin.constprop.0+0x0/0x15
[] ? os_nsecs+0x1d/0x2b
[] refcount_warn_saturate+0x13f/0x141
[] refcount_sub_and_test.constprop.0+0x2f/0x37
[] blk_mq_free_request+0xf1/0x10d
[] __blk_mq_end_request+0x10c/0x114
[] ubd_intr+0xb5/0x169
[] __handle_irq_event_percpu+0x6b/0x17e
[] handle_irq_event_percpu+0x26/0x69
[] handle_irq_event+0x26/0x34
[] ? handle_irq_event+0x0/0x34
[] ? unmask_irq+0x0/0x37
[] handle_edge_irq+0xbc/0xd6
[] generic_handle_irq+0x21/0x29
[] do_IRQ+0x39/0x54
[...]
---[ end trace c6e7444e55386c0f ]---

Cc: Christopher Obbard
Reported-by: Martyn Welch
Signed-off-by: Gabriel Krisman Bertazi
Tested-by: Christopher Obbard
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin

Gabriel Krisman Bertazi
2021-01-06 21:56:55 +0800
a8b49c4bd um: random: Register random as hwrng-core device ... Browse Code »

[ Upstream commit 72d3e093afae79611fa38f8f2cfab9a888fe66f2 ]

The UML random driver creates a dummy device under the guest,
/dev/hw_random. When this file is read from the guest, the driver
reads from the host machine's /dev/random, in-turn reading from
the host kernel's entropy pool. This entropy pool could have been
filled by a hardware random number generator or just the host
kernel's internal software entropy generator.

Currently the driver does not fill the guests kernel entropy pool,
this requires a userspace tool running inside the guest (like
rng-tools) to read from the dummy device provided by this driver,
which then would fill the guest's internal entropy pool.

This all seems quite pointless when we are already reading from an
entropy pool, so this patch aims to register the device as a hwrng
device using the hwrng-core framework. This not only improves and
cleans up the driver, but also fills the guest's entropy pool
without having to resort to using extra userspace tools in the guest.

This is typically a nuisance when booting a guest: the random pool
takes a long time (~200s) to build up enough entropy since the dummy
hwrng is not used to fill the guest's pool.

This port was originally attempted by Alexander Neville "dark" (in CC,
discussion in Link), but the conversation there stalled since the
handling of -EAGAIN errors were no removed and longer handled by the
driver. This patch attempts to use the existing method of error
handling but utilises the new hwrng core.

The issue can be noticed when booting a UML guest:

[ 2.560000] random: fast init done
[ 214.000000] random: crng init done

With the patch applied, filling the pool becomes a lot quicker:

[ 2.560000] random: fast init done
[ 12.000000] random: crng init done

Cc: Alexander Neville
Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/
Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/
Cc: Sjoerd Simons
Signed-off-by: Christopher Obbard
Acked-by: Anton Ivanov
Signed-off-by: Richard Weinberger
Signed-off-by: Sasha Levin

Christopher Obbard
2021-01-06 21:56:55 +0800
0aa2eecf8 watchdog: rti-wdt: fix reference leak in rti_wdt_probe ... Browse Code »

[ Upstream commit 8711071e9700b67045fe5518161d63f7a03e3c9e ]

pm_runtime_get_sync() will increment pm usage counter even it
failed. Forgetting to call pm_runtime_put_noidle will result
in reference leak in rti_wdt_probe, so we should fix it.

Signed-off-by: Zhang Qilong
Reviewed-by: Guenter Roeck
Link: https://lore.kernel.org/r/20201030154909.100023-1-zhangqilong3@huawei.com
Signed-off-by: Guenter Roeck
Signed-off-by: Wim Van Sebroeck
Signed-off-by: Sasha Levin

Zhang Qilong
2021-01-06 21:56:54 +0800
eae1fb3bc fs/namespace.c: WARN if mnt_count has become negative ... Browse Code »

[ Upstream commit edf7ddbf1c5eb98b720b063b73e20e8a4a1ce673 ]

Missing calls to mntget() (or equivalently, too many calls to mntput())
are hard to detect because mntput() delays freeing mounts using
task_work_add(), then again using call_rcu(). As a result, mnt_count
can often be decremented to -1 without getting a KASAN use-after-free
report. Such cases are still bugs though, and they point to real
use-after-frees being possible.

For an example of this, see the bug fixed by commit 1b0b9cc8d379
("vfs: fsmount: add missing mntget()"), discussed at
https://lkml.kernel.org/linux-fsdevel/20190605135401.GB30925@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#u.
This bug *should* have been trivial to find. But actually, it wasn't
found until syzkaller happened to use fchdir() to manipulate the
reference count just right for the bug to be noticeable.

Address this by making mntput_no_expire() issue a WARN if mnt_count has
become negative.

Suggested-by: Miklos Szeredi
Signed-off-by: Eric Biggers
Signed-off-by: Al Viro
Signed-off-by: Sasha Levin

Eric Biggers
2021-01-06 21:56:54 +0800
b1e155ccc powerpc/64: irq replay remove decrementer overflow check ... Browse Code »

[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long. It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin

Nicholas Piggin
2021-01-06 21:56:54 +0800
8b5b2b768 module: delay kobject uevent until after module init call ... Browse Code »

[ Upstream commit 38dc717e97153e46375ee21797aa54777e5498f3 ]

Apparently there has been a longstanding race between udev/systemd and
the module loader. Currently, the module loader sends a uevent right
after sysfs initialization, but before the module calls its init
function. However, some udev rules expect that the module has
initialized already upon receiving the uevent.

This race has been triggered recently (see link in references) in some
systemd mount unit files. For instance, the configfs module creates the
/sys/kernel/config mount point in its init function, however the module
loader issues the uevent before this happens. sys-kernel-config.mount
expects to be able to mount /sys/kernel/config upon receipt of the
module loading uevent, but if the configfs module has not called its
init function yet, then this directory will not exist and the mount unit
fails. A similar situation exists for sys-fs-fuse-connections.mount, as
the fuse sysfs mount point is created during the fuse module's init
function. If udev is faster than module initialization then the mount
unit would fail in a similar fashion.

To fix this race, delay the module KOBJ_ADD uevent until after the
module has finished calling its init routine.

References: https://github.com/systemd/systemd/issues/17586
Reviewed-by: Greg Kroah-Hartman
Tested-By: Nicolas Morey-Chaisemartin
Signed-off-by: Jessica Yu
Signed-off-by: Sasha Levin

Jessica Yu
2021-01-06 21:56:54 +0800
db6129f6a f2fs: fix race of pending_pages in decompression ... Browse Code »

[ Upstream commit 6422a71ef40e4751d59b8c9412e7e2dafe085878 ]

I found out f2fs_free_dic() is invoked in a wrong timing, but
f2fs_verify_bio() still needed the dic info and it triggered the
below kernel panic. It has been caused by the race condition of
pending_pages value between decompression and verity logic, when
the same compression cluster had been split in different bios.
By split bios, f2fs_verify_bio() ended up with decreasing
pending_pages value before it is reset to nr_cpages by
f2fs_decompress_pages() and caused the kernel panic.

[ 4416.564763] Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000000
...
[ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work
[ 4416.908515] pc : fsverity_verify_page+0x20/0x78
[ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c
[ 4416.913722] sp : ffffffc019533cd0
[ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402
[ 4416.913724] x27: 0000000000000001 x26: 0000000000000100
[ 4416.913726] x25: 0000000000000001 x24: 0000000000000004
[ 4416.913727] x23: 0000000000001000 x22: 0000000000000000
[ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0
[ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30
[ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298
[ 4416.913732] x15: 0000000000000000 x14: 0000000000000000
[ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000
[ 4416.913734] x11: 0000000000001000 x10: 0000000000001000
[ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000
[ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade
[ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0
[ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0
[ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0
[ 4416.929184] Call trace:
[ 4416.929187] fsverity_verify_page+0x20/0x78
[ 4416.929189] f2fs_verify_bio+0x11c/0x29c
[ 4416.929192] f2fs_verity_work+0x58/0x84
[ 4417.050667] process_one_work+0x270/0x47c
[ 4417.055354] worker_thread+0x27c/0x4d8
[ 4417.059784] kthread+0x13c/0x320
[ 4417.063693] ret_from_fork+0x10/0x18

Chao pointed this can happen by the below race condition.

Thread A f2fs_post_read_wq fsverity_wq
- f2fs_read_multi_pages()
- f2fs_alloc_dic
- dic->pending_pages = 2
- submit_bio()
- submit_bio()
- f2fs_post_read_work() handle first bio
- f2fs_decompress_work()
- __read_end_io()
- f2fs_decompress_pages()
- dic->pending_pages--
- enqueue f2fs_verity_work()
- f2fs_verity_work() handle first bio
- f2fs_verify_bio()
- dic->pending_pages--
- f2fs_post_read_work() handle second bio
- f2fs_decompress_work()
- enqueue f2fs_verity_work()
- f2fs_verify_pages()
- f2fs_free_dic()

- f2fs_verity_work() handle second bio
- f2fs_verfy_bio()
- use-after-free on dic

Signed-off-by: Daeho Jeong
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Daeho Jeong
2021-01-06 21:56:54 +0800
ee3f8aefd f2fs: avoid race condition for shrinker count ... Browse Code »

[ Upstream commit a95ba66ac1457b76fe472c8e092ab1006271f16c ]

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to return insane overflowed value by adding
single tracking value.

Reported-by: Light Hsieh
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
Signed-off-by: Sasha Levin

Jaegeuk Kim
2021-01-06 21:56:54 +0800
3c0f0f5f5 NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode ... Browse Code »

[ Upstream commit b6d49ecd1081740b6e632366428b960461f8158b ]

When returning the layout in nfs4_evict_inode(), we need to ensure that
the layout is actually done being freed before we can proceed to free the
inode itself.

Signed-off-by: Trond Myklebust
Signed-off-by: Sasha Levin

Trond Myklebust
2021-01-06 21:56:54 +0800
06ac2ca09 i3c master: fix missing destroy_workqueue() on error in i3c_master_register ... Browse Code »

[ Upstream commit 59165d16c699182b86b5c65181013f1fd88feb62 ]

Add the missing destroy_workqueue() before return from
i3c_master_register in the error handling case.

Signed-off-by: Qinglang Miao
Signed-off-by: Boris Brezillon
Link: https://lore.kernel.org/linux-i3c/20201028091543.136167-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin

Qinglang Miao
2021-01-06 21:56:53 +0800
498d90690 powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() ... Browse Code »

[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin

Qinglang Miao
2021-01-06 21:56:53 +0800
acc3c8cc2 rtc: pl031: fix resource leak in pl031_probe ... Browse Code »

[ Upstream commit 1eab0fea2514b269e384c117f5b5772b882761f0 ]

When devm_rtc_allocate_device is failed in pl031_probe, it should release
mem regions with device.

Reported-by: Hulk Robot
Signed-off-by: Zheng Liang
Signed-off-by: Alexandre Belloni
Acked-by: Linus Walleij
Link: https://lore.kernel.org/r/20201112093139.32566-1-zhengliang6@huawei.com
Signed-off-by: Sasha Levin

Zheng Liang
2021-01-06 21:56:53 +0800
26058c397 quota: Don't overflow quota file offsets ... Browse Code »

[ Upstream commit 10f04d40a9fa29785206c619f80d8beedb778837 ]

The on-disk quota format supports quota files with upto 2^32 blocks. Be
careful when computing quota file offsets in the quota files from block
numbers as they can overflow 32-bit types. Since quota files larger than
4GB would require ~26 millions of quota users, this is mostly a
theoretical concern now but better be careful, fuzzers would find the
problem sooner or later anyway...

Reviewed-by: Andreas Dilger
Signed-off-by: Jan Kara
Signed-off-by: Sasha Levin

Jan Kara
2021-01-06 21:56:53 +0800