Eric Lee / smarc-fsl-linux-kernel

17 Jan, 2021

2 commits

6f7a362e1 block/rnbd-clt: avoid module unload race with close confirmation ... Browse Code »

commit 3a21777c6ee99749bac10727b3c17e5bcfebe5c1 upstream.

We had kernel panic, it is caused by unload module and last
close confirmation.

call trace:
[1196029.743127] free_sess+0x15/0x50 [rtrs_client]
[1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client]
[1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
[1196029.743130] close_rtrs+0x25/0x50 [rnbd_client]
[1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client]
[1196029.743132] __x64_sys_delete_module+0x190/0x260

And in the crashdump confirmation kworker is also running.
PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2"
#0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
#1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
#2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
#3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
#4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
#5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
#6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
#7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
#8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
#9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]

The problem is both code path try to close same session, which lead to
panic.

To fix it, just skip the sess if the refcount already drop to 0.

Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality")
Signed-off-by: Jack Wang
Reviewed-by: Gioh Kim
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jack Wang
2021-01-17 21:17:05 +0800
432071f6a block: rsxx: select CONFIG_CRC32 ... Browse Code »

commit 36a106a4c1c100d55ba3d32a21ef748cfcd4fa99 upstream.

Without crc32, the driver fails to link:

arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config':
config.c:(.text+0x124): undefined reference to `crc32_le'

Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver")
Signed-off-by: Arnd Bergmann
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Arnd Bergmann
2021-01-17 21:17:03 +0800

30 Dec, 2020

8 commits

c13edadf1 null_blk: Fail zone append to conventional zones ... Browse Code »

commit 2e896d89510f23927ec393bee1e0570db3d5a6c6 upstream.

Conventional zones do not have a write pointer and so cannot accept zone
append writes. Make sure to fail any zone append write command issued to
a conventional zone.

Reported-by: Naohiro Aota
Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Damien Le Moal
2020-12-30 18:54:29 +0800
92ee9b9fa null_blk: Fix zone size initialization ... Browse Code »

commit 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 upstream.

For a null_blk device with zoned mode enabled is currently initialized
with a number of zones equal to the device capacity divided by the zone
size, without considering if the device capacity is a multiple of the
zone size. If the zone size is not a divisor of the capacity, the zones
end up not covering the entire capacity, potentially resulting is out
of bounds accesses to the zone array.

Fix this by adding one last smaller zone with a size equal to the
remainder of the disk capacity divided by the zone size if the capacity
is not a multiple of the zone size. For such smaller last zone, the zone
capacity is also checked so that it does not exceed the smaller zone
size.

Reported-by: Naohiro Aota
Fixes: ca4b2a011948 ("null_blk: add zone support")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Damien Le Moal
2020-12-30 18:54:29 +0800
9ae6d2f4c xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path() ... Browse Code »

commit 2e85d32b1c865bec703ce0c962221a5e955c52c2 upstream.

Some code does not directly make 'xenbus_watch' object and call
'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This
commit adds support of 'will_handle' callback in the
'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.

This is part of XSA-349

Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park
Reported-by: Michael Kurth
Reported-by: Pawel Wieczorkiewicz
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman

SeongJae Park
2020-12-30 18:54:26 +0800
aadd67750 xen-blkback: set ring->xenblkd to NULL after kthread_stop() ... Browse Code »

commit 1c728719a4da6e654afb9cc047164755072ed7c9 upstream.

When xen_blkif_disconnect() is called, the kernel thread behind the
block interface is stopped by calling kthread_stop(ring->xenblkd).
The ring->xenblkd thread pointer being non-NULL determines if the
thread has been already stopped.
Normally, the thread's function xen_blkif_schedule() sets the
ring->xenblkd to NULL, when the thread's main loop ends.

However, when the thread has not been started yet (i.e.
wake_up_process() has not been called on it), the xen_blkif_schedule()
function would not be called yet.

In such case the kthread_stop() call returns -EINTR and the
ring->xenblkd remains dangling.
When this happens, any consecutive call to xen_blkif_disconnect (for
example in frontend_changed() callback) leads to a kernel crash in
kthread_stop() (e.g. NULL pointer dereference in exit_creds()).

This is XSA-350.

Cc: # 4.12
Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread")
Reported-by: Olivier Benjamin
Reported-by: Pawel Wieczorkiewicz
Signed-off-by: Pawel Wieczorkiewicz
Reviewed-by: Julien Grall
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman

Pawel Wieczorkiewicz
2020-12-30 18:54:26 +0800
1b75aea3e block/rnbd-clt: Fix possible memleak ... Browse Code »

[ Upstream commit 46067844efdb8275ade705923120fc5391543b53 ]

In error case, we do not free the memory for blk_symlink_name.

Do it by free the memory in error case, and set to NULL
afterwards.

Also fix the condition in rnbd_clt_remove_dev_symlink.

Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name")
Signed-off-by: Jack Wang
Reviewed-by: Md Haris Iqbal
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Jack Wang
2020-12-30 18:53:57 +0800
996ce53a2 block/rnbd-clt: Get rid of warning regarding size argument in strlcpy ... Browse Code »

[ Upstream commit e7508d48565060af5d89f10cb83c9359c8ae1310 ]

The kernel test robot triggerred the following warning,

>> drivers/block/rnbd/rnbd-clt.c:1397:42: warning: size argument in
'strlcpy' call appears to be size of the source; expected the size of the
destination [-Wstrlcpy-strlcat-size]
strlcpy(dev->pathname, pathname, strlen(pathname) + 1);
~~~~~~~^~~~~~~~~~~~~

To get rid of the above warning, use a kstrdup as Bart suggested.

Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name")
Reported-by: kernel test robot
Signed-off-by: Md Haris Iqbal
Signed-off-by: Jack Wang
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Md Haris Iqbal
2020-12-30 18:53:57 +0800
aa4f552ae block/rnbd: fix a null pointer dereference on dev->blk_symlink_name ... Browse Code »

[ Upstream commit 733c15bd3a944b8eeaacdddf061759b6a83dd3f4 ]

Currently in the case where dev->blk_symlink_name fails to be allocates
the error return path attempts to set an end-of-string character to
the unallocated dev->blk_symlink_name causing a null pointer dereference
error. Fix this by returning with an explicity ENOMEM error (which also
is missing in the original code as was not initialized).

Fixes: 1eb54f8f5dd8 ("block/rnbd: client: sysfs interface functions")
Signed-off-by: Colin Ian King
Addresses-Coverity: ("Dereference after null check")
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Colin Ian King
2020-12-30 18:53:40 +0800
a7d4dd109 block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name ... Browse Code »

[ Upstream commit 64e8a6ece1a5b1fa21316918053d068baeac84af ]

For every rnbd_clt_dev, we alloc the pathname and blk_symlink_name
statically to NAME_MAX which is 255 bytes. In most of the cases we only
need less than 10 bytes, so 500 bytes per block device are wasted.

This commit dynamically allocates memory buffer for pathname and
blk_symlink_name.

Signed-off-by: Md Haris Iqbal
Signed-off-by: Jack Wang
Reviewed-by: Lutz Pogrell
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Md Haris Iqbal
2020-12-30 18:53:40 +0800

09 Dec, 2020

1 commit

ca33479cc xen: add helpers for caching grant mapping pages ... Browse Code »

Instead of having similar helpers in multiple backend drivers use
common helpers for caching pages allocated via gnttab_alloc_pages().

Make use of those helpers in blkback and scsiback.

Cc: # 5.9
Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Juergen Gross

Juergen Gross
2020-12-09 17:31:37 +0800

13 Nov, 2020

1 commit

c01a21b77 loop: Fix occasional uevent drop ... Browse Code »

Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
causes an occasional drop of loop device uevent, which are no longer
triggered in loop_set_size() but in a different part of code.

Bug is reproducible with LTP test uevent01 [1]:

i=0; while true; do
i=$((i+1)); echo "== $i =="
lsmod |grep -q loop && rmmod -f loop
./uevent01 || break
done

Put back triggering through code called in loop_set_size().

Fix required to add yet another parameter to
set_capacity_revalidate_and_notify().

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c

[hch: rebased on a different change to the prototype of
set_capacity_revalidate_and_notify]

Cc: stable@vger.kernel.org # v5.9
Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
Reported-by:
Signed-off-by: Petr Vorel
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Petr Vorel
2020-11-13 04:59:04 +0800

10 Nov, 2020

1 commit

2bd645b2d nbd: fix a block_device refcount leak in nbd_release ... Browse Code »

bdget_disk needs to be paired with bdput to not leak a reference
on the block device inode.

Fixes: 08ba91ee6e2c ("nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.")
Signed-off-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-11-10 23:01:28 +0800

07 Nov, 2020

1 commit

e1777d099 null_blk: Fix scheduling in atomic with zoned mode ... Browse Code »

Commit aa1c09cb65e2 ("null_blk: Fix locking in zoned mode") changed
zone locking to using the potentially sleeping wait_on_bit_io()
function. This is acceptable when memory backing is enabled as the
device queue is in that case marked as blocking, but this triggers a
scheduling while in atomic context with memory backing disabled.

Fix this by relying solely on the device zone spinlock for zone
information protection without temporarily releasing this lock around
null_process_cmd() execution in null_zone_write(). This is OK to do
since when memory backing is disabled, command processing does not
block and the memory backing lock nullb->lock is unused. This solution
avoids the overhead of having to mark a zoned null_blk device queue as
blocking when memory backing is unused.

This patch also adds comments to the zone locking code to explain the
unusual locking scheme.

Fixes: aa1c09cb65e2 ("null_blk: Fix locking in zoned mode")
Reported-by: kernel test robot
Signed-off-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Damien Le Moal
2020-11-07 00:36:42 +0800

29 Oct, 2020

4 commits

7cb6e22ba xsysace: use platform_get_resource() and platform_get_irq_optional() ... Browse Code »

Use platform_get_resource() to fetch the memory resource and
platform_get_irq_optional() to get optional IRQ instead of
open-coded variants.

IRQ is not supposed to be changed at runtime, so there is
no functional change in ace_fsm_yieldirq().

On the other hand we now take first resources instead of last ones
to proceed. I can't imagine how broken should be firmware to have
a garbage in the first resource slots. But if it the case, it needs
to be documented.

Signed-off-by: Andy Shevchenko
Acked-by: Michal Simek
Signed-off-by: Jens Axboe

Andy Shevchenko
2020-10-29 22:22:33 +0800
aa1c09cb6 null_blk: Fix locking in zoned mode ... Browse Code »

When the zoned mode is enabled in null_blk, Serializing read, write
and zone management operations for each zone is necessary to protect
device level information for managing zone resources (zone open and
closed counters) as well as each zone condition and write pointer
position. Commit 35bc10b2eafb ("null_blk: synchronization fix for
zoned device") introduced a spinlock to implement this serialization.
However, when memory backing is also enabled, GFP_NOIO memory
allocations are executed under the spinlock, resulting in might_sleep()
warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
the nullb->lock spinlock. This nested use of irq locks wrecks the irq
enabled/disabled state.

Fix all this by introducing a bitmap for per-zone lock, with locking
implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
This locking mechanism allows keeping a zone locked while executing
null_process_cmd(), serializing all operations to the zone while
allowing to sleep during memory backing allocation with GFP_NOIO.
Device level zone resource management information is protected using
a spinlock which is not held while executing null_process_cmd();

Fixes: 35bc10b2eafb ("null_blk: synchronization fix for zoned device")
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2020-10-29 21:27:30 +0800
f9c910428 null_blk: Fix zone reset all tracing ... Browse Code »

In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
ignored and the operation is applied to all sequential zones. For these
commands, tracing the effect of the command using the command sector to
determine the target zone is thus incorrect.

Fix null_zone_mgmt() zone condition tracing in the case of
REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
not already empty.

Fixes: 766c3297d7e1 ("null_blk: add trace in null_blk_zoned.c")
Signed-off-by: Damien Le Moal
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Damien Le Moal
2020-10-29 21:27:30 +0800
b40813ddc nbd: don't update block size after device is started ... Browse Code »

Mounted NBD device can be resized, one use case is rbd-nbd.

Fix the issue by setting up default block size, then not touch it
in nbd_size_update() any more. This kind of usage is aligned with loop
which has same use case too.

Cc: stable@vger.kernel.org
Fixes: c8a83a6b54d0 ("nbd: Use set_blocksize() to set device blocksize")
Reported-by: lining
Signed-off-by: Ming Lei
Cc: Josef Bacik
Cc: Jan Kara
Tested-by: lining
Signed-off-by: Jens Axboe

Ming Lei
2020-10-29 21:25:43 +0800

28 Oct, 2020

1 commit

35bc10b2e null_blk: synchronization fix for zoned device ... Browse Code »

Parallel write,read,zone-mgmt operations accessing/altering zone state
and write-pointer may get into race. Avoid the situation by using a new
spinlock for zoned device.
Concurrent zone-appends (on a zone) returning same write-pointer issue
is also avoided using this lock.

Cc: stable@vger.kernel.org
Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
Signed-off-by: Kanchan Joshi
Reviewed-by: Damien Le Moal
Signed-off-by: Jens Axboe

Kanchan Joshi
2020-10-28 10:40:42 +0800

26 Oct, 2020

1 commit

bd6aabc7c Merge tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull more xen updates from Juergen Gross:

- a series for the Xen pv block drivers adding module parameters for
better control of resource usge

- a cleanup series for the Xen event driver

* tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Documentation: add xen.fifo_events kernel parameter description
xen/events: unmask a fifo event channel only if it was masked
xen/events: only register debug interrupt for 2-level events
xen/events: make struct irq_info private to events_base.c
xen: remove no longer used functions
xen-blkfront: Apply changed parameter name to the document
xen-blkfront: add a parameter for disabling of persistent grants
xen-blkback: add a parameter for disabling of persistent grants

Linus Torvalds
2020-10-26 01:55:35 +0800

25 Oct, 2020

1 commit

d76913908 Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- NVMe pull request from Christoph
- rdma error handling fixes (Chao Leng)
- fc error handling and reconnect fixes (James Smart)
- fix the qid displace when tracing ioctl command (Keith Busch)
- don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
- fix MTDT for passthru (Logan Gunthorpe)
- blacklist Write Same on more devices (Kai-Heng Feng)
- fix an uninitialized work struct (zhenwei pi)"

- lightnvm out-of-bounds fix (Colin)

- SG allocation leak fix (Doug)

- rnbd fixes (Gioh, Guoqing, Jack)

- zone error translation fixes (Keith)

- kerneldoc markup fix (Mauro)

- zram lockdep fix (Peter)

- Kill unused io_context members (Yufen)

- NUMA memory allocation cleanup (Xianting)

- NBD config wakeup fix (Xiubo)

* tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
block: blk-mq: fix a kernel-doc markup
nvme-fc: shorten reconnect delay if possible for FC
nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
nvme-fc: fix error loop in create_hw_io_queues
nvme-fc: fix io timeout to abort I/O
null_blk: use zone status for max active/open
nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
nvmet: cleanup nvmet_passthru_map_sg()
nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet: fix uninitialized work for zero kato
nvme-pci: disable Write Zeroes on Sandisk Skyhawk
nvme: use queuedata for nvme_req_qid
nvme-rdma: fix crash due to incorrect cqe
nvme-rdma: fix crash when connect rejected
block: remove unused members for io_context
blk-mq: remove the calling of local_memory_node()
zram: Fix __zram_bvec_{read,write}() locking order
skd_main: remove unused including
sgl_alloc_order: fix memory leak
lightnvm: fix out-of-bounds write to array devices->info[]
...

Linus Torvalds
2020-10-25 03:46:42 +0800

23 Oct, 2020

1 commit

fd78874b7 null_blk: use zone status for max active/open ... Browse Code »

The block layer provides special status codes when requests go beyond
the zone resource limits. Use these codes instead of the generic IOERR
for requests that exceed the max active or open limits the null_blk
device was configured with so that applications know how these special
conditions should be handled.

Signed-off-by: Keith Busch
Reviewed-by: Niklas Cassel
Cc: Damien Le Moal
Cc: Niklas Cassel
Signed-off-by: Jens Axboe

Keith Busch
2020-10-23 00:38:33 +0800

22 Oct, 2020

1 commit

ed7cfefe4 Merge tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:

- a patch that removes crush_workspace_mutex (myself). CRUSH
computations are no longer serialized and can run in parallel.

- a couple new filesystem client metrics for "ceph fs top" command
(Xiubo Li)

- a fix for a very old messenger bug that affected the filesystem,
marked for stable (myself)

- assorted fixups and cleanups throughout the codebase from Jeff and
others.

* tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits)
libceph: clear con->out_msg on Policy::stateful_server faults
libceph: format ceph_entity_addr nonces as unsigned
libceph: fix ENTITY_NAME format suggestion
libceph: move a dout in queue_con_delay()
ceph: comment cleanups and clarifications
ceph: break up send_cap_msg
ceph: drop separate mdsc argument from __send_cap
ceph: promote to unsigned long long before shifting
ceph: don't SetPageError on readpage errors
ceph: mark ceph_fmt_xattr() as printf-like for better type checking
ceph: fold ceph_update_writeable_page into ceph_write_begin
ceph: fold ceph_sync_writepages into writepage_nounlock
ceph: fold ceph_sync_readpages into ceph_readpage
ceph: don't call ceph_update_writeable_page from page_mkwrite
ceph: break out writeback of incompatible snap context to separate function
ceph: add a note explaining session reject error string
libceph: switch to the new "osd blocklist add" command
libceph, rbd, ceph: "blacklist" -> "blocklist"
ceph: have ceph_writepages_start call pagevec_lookup_range_tag
ceph: use kill_anon_super helper
...

Linus Torvalds
2020-10-22 01:34:10 +0800

21 Oct, 2020

3 commits

74a852479 xen-blkfront: add a parameter for disabling of persistent grants ... Browse Code »

Persistent grants feature provides high scalability. On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled. It can be disabled from blkback side using a
module parameter, 'feature_persistent'. But, it is impossible from
blkfront side. For the reason, this commit adds a blkfront module
parameter for disabling of the feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Signed-off-by: SeongJae Park
Reviewed-by: Juergen Gross
Acked-by: Roger Pau Monné
Link: https://lore.kernel.org/r/20200923061841.20531-3-sjpark@amazon.com
Signed-off-by: Boris Ostrovsky

SeongJae Park
2020-10-21 20:15:15 +0800
aac8a70db xen-blkback: add a parameter for disabling of persistent grants ... Browse Code »

Persistent grants feature provides high scalability. On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled. But, there is no option to disable it. For
the reason, this commit adds a module parameter for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Signed-off-by: Anthony Liguori
Signed-off-by: SeongJae Park
Reviewed-by: Juergen Gross
Acked-by: Roger Pau Monné
Link: https://lore.kernel.org/r/20200923061841.20531-2-sjpark@amazon.com
Signed-off-by: Boris Ostrovsky

SeongJae Park
2020-10-21 20:15:13 +0800
4a5bb973f Merge tag 'for-linus-5.10b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull more xen updates from Juergen Gross:

- A single patch to fix the Xen security issue XSA-331 (malicious
guests can DoS dom0 by triggering NULL-pointer dereferences or access
to stale data).

- A larger series to fix the Xen security issue XSA-332 (malicious
guests can DoS dom0 by sending events at high frequency leading to
dom0's vcpus being busy in IRQ handling for elongated times).

* tag 'for-linus-5.10b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: block rogue events for some time
xen/events: defer eoi in case of excessive number of events
xen/events: use a common cpu hotplug hook for event channels
xen/events: switch user event channels to lateeoi model
xen/pciback: use lateeoi irq binding
xen/pvcallsback: use lateeoi irq binding
xen/scsiback: use lateeoi irq binding
xen/netback: use lateeoi irq binding
xen/blkback: use lateeoi irq binding
xen/events: add a new "late EOI" evtchn framework
xen/events: fix race in evtchn_fifo_unmask()
xen/events: add a proper barrier to 2-level uevent unmasking
xen/events: avoid removing an event channel while handling it

Linus Torvalds
2020-10-21 00:24:01 +0800

20 Oct, 2020

1 commit

01263a1fa xen/blkback: use lateeoi irq binding ... Browse Code »

In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving blkfront use the lateeoi
irq binding for blkback and unmask the event channel only after
processing all pending requests.

As the thread processing requests is used to do purging work in regular
intervals an EOI may be sent only after having received an event. If
there was no pending I/O request flag the EOI as spurious.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall
Signed-off-by: Juergen Gross
Reviewed-by: Jan Beulich
Reviewed-by: Wei Liu

Juergen Gross
2020-10-20 16:22:01 +0800

19 Oct, 2020

1 commit

0669d2b26 zram: Fix __zram_bvec_{read,write}() locking order ... Browse Code »

Mikhail reported a lockdep spat detailing how __zram_bvec_read() and
__zram_bvec_write() use zstrm->lock and zspage->lock in opposite order.

Reported-by: Mikhail Gavrilov
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Mikhail Gavrilov
Acked-by: Minchan Kim
Acked-by: Sebastian Andrzej Siewior
Signed-off-by: Jens Axboe

Peter Zijlstra
2020-10-19 23:32:28 +0800

17 Oct, 2020

3 commits

db0732727 skd_main: remove unused including <linux/version.h> ... Browse Code »

Remove including that don't need it.

Signed-off-by: Tian Tao
Signed-off-by: Jens Axboe

Tian Tao
2020-10-17 22:11:14 +0800
c4cf498dc Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:
"155 patches.

Subsystems affected by this patch series: mm (dax, debug, thp,
readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
romfs, and fault-injection"

* emailed patches from Andrew Morton : (155 commits)
lib, uaccess: add failure injection to usercopy functions
lib, include/linux: add usercopy failure capability
ROMFS: support inode blocks calculation
ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
sched.h: drop in_ubsan field when UBSAN is in trap mode
scripts/gdb/tasks: add headers and improve spacing format
scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
kernel/relay.c: drop unneeded initialization
panic: dump registers on panic_on_warn
rapidio: fix the missed put_device() for rio_mport_add_riodev
rapidio: fix error handling path
nilfs2: fix some kernel-doc warnings for nilfs2
autofs: harden ioctl table
ramfs: fix nommu mmap with gaps in the page cache
mm: remove the now-unnecessary mmget_still_valid() hack
mm/gup: take mmap_lock in get_dump_page()
binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
coredump: rework elf/elf_fdpic vma_dump_size() into common helper
coredump: refactor page range dumping into common helper
coredump: let dump_emit() bail out on short writes
...

Linus Torvalds
2020-10-17 02:31:55 +0800
4e79603bb zram: failing to decompress is WARN_ON worthy ... Browse Code »

If we fail to decompress in zram it's a pretty serious problem. We were
entrusted to be able to decompress the old data but we failed. Either
we've got some crazy bug in the compression code or we've got memory
corruption.

At the moment, when this happens the log looks like this:

ERR kernel: [ 1833.099861] zram: Decompression failed! err=-22, page=336112
ERR kernel: [ 1833.099881] zram: Decompression failed! err=-22, page=336112
ALERT kernel: [ 1833.099886] Read-error on swap-device (253:0:2688896)

It is true that we have an "ALERT" level log in there, but (at least to
me) it feels like even this isn't enough to impart the seriousness of this
error. Let's convert to a WARN_ON. Note that WARN_ON is automatically
"unlikely" so we can simply replace the old annotation with the new one.

Signed-off-by: Douglas Anderson
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Sonny Rao
Cc: Jens Axboe
Link: https://lkml.kernel.org/r/20200917174059.1.If09c882545dbe432268f7a67a4d4cfcb6caace4f@changeid
Signed-off-by: Linus Torvalds

Douglas Anderson
2020-10-17 02:11:18 +0800

16 Oct, 2020

1 commit

9ff9b0d39 Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from Jakub Kicinski:

- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.

Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).

- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.

- Allow more than 255 IPv4 multicast interfaces.

- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.

- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.

- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.

- Allow more calls to same peer in RxRPC.

- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.

- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.

- Add TC actions for implementing MPLS L2 VPNs.

- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.

- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.

- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.

- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.

- Support sleepable BPF programs, initially targeting LSM and tracing.

- Add bpf_d_path() helper for returning full path for given 'struct
path'.

- Make bpf_tail_call compatible with bpf-to-bpf calls.

- Allow BPF programs to call map_update_elem on sockmaps.

- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).

- Support listing and getting information about bpf_links via the bpf
syscall.

- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).

- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.

- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).

- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.

- Add XDP support for Intel's igb driver.

- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.

- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.

- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.

- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.

- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.

- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.

- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.

- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.

- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.

- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...

Linus Torvalds
2020-10-16 09:42:13 +0800

15 Oct, 2020

1 commit

87aac3a80 nbd: make the config put is called before the notifying the waiter ... Browse Code »

There has one race case for ceph's rbd-nbd tool. When do mapping
it may fail with EBUSY from ioctl(nbd, NBD_DO_IT), but actually
the nbd device has already unmaped.

It dues to if just after the wake_up(), the recv_work() is scheduled
out and defers calling the nbd_config_put(), though the map process
has exited the "nbd->recv_task" is not cleared.

Signed-off-by: Xiubo Li
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe

Xiubo Li
2020-10-15 02:30:37 +0800

14 Oct, 2020

5 commits

47be77c2f block/rnbd-clt: send_msg_close if any error occurs after send_msg_open ... Browse Code »

After send_msg_open is done, send_msg_close should be done
if any error occurs and it is necessary to recover
what has been done.

Signed-off-by: Gioh Kim
Signed-off-by: Jack Wang
Signed-off-by: Jens Axboe

Gioh Kim
2020-10-14 05:05:05 +0800
050b654b2 block/rnbd-clt: do not cap max_hw_sectors & max_segments with remote device ... Browse Code »

The max_hw_secotrs is only limited by the transport, not remote device,
block layer on server side will split to the device limit if it's too
big.

The max_segments, similar, and rtrs server will submit single buffer, so
no need to cap.

Signed-off-by: Jack Wang
Signed-off-by: Jens Axboe

Jack Wang
2020-10-14 05:05:05 +0800
46a99e0cf block/rnbd-clt: remove nr argument from send_usr_msg ... Browse Code »

The argument is not needed since all callers pass 1 for it.

Signed-off-by: Guoqing Jiang
Signed-off-by: Jack Wang
Signed-off-by: Jens Axboe

Guoqing Jiang
2020-10-14 05:05:05 +0800
7cd4ecd91 Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.

A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.

This contains:

- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)

- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu

- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)

- null_blk open/active zone limit support (Niklas)

- Set of bcache updates (Coly, Dongsheng, Qinglang)"

* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...

Linus Torvalds
2020-10-14 04:04:41 +0800
3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

12 Oct, 2020

1 commit

0b98acd61 libceph, rbd, ceph: "blacklist" -> "blocklist" ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2020-10-12 21:29:26 +0800

06 Oct, 2020

1 commit

8b0308fe3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Rejecting non-native endian BTF overlapped with the addition
of support for it.

The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.

Signed-off-by: David S. Miller

David S. Miller
2020-10-06 09:40:01 +0800