Eric Lee / smarc-fsl-linux-kernel

14 Jun, 2020

1 commit

a7f7f6248 treewide: replace '---help---' in Kconfig files with 'help' ... Browse Code »

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2020-06-14 00:57:21 +0800

06 Jun, 2020

24 commits

b25c6644b Merge tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- The largest change for this cycle is the DM zoned target's metadata
version 2 feature that adds support for pairing regular block devices
with a zoned device to ease the performance impact associated with
finite random zones of zoned device.

The changes came in three batches: the first prepared for and then
added the ability to pair a single regular block device, the second
was a batch of fixes to improve zoned's reclaim heuristic, and the
third removed the limitation of only adding a single additional
regular block device to allow many devices.

Testing has shown linear scaling as more devices are added.

- Add new emulated block size (ebs) target that emulates a smaller
logical_block_size than a block device supports

The primary use-case is to emulate "512e" devices that have 512 byte
logical_block_size and 4KB physical_block_size. This is useful to
some legacy applications that otherwise wouldn't be able to be used
on 4K devices because they depend on issuing IO in 512 byte
granularity.

- Add discard interfaces to DM bufio. First consumer of the interface
is the dm-ebs target that makes heavy use of dm-bufio.

- Fix DM crypt's block queue_limits stacking to not truncate
logic_block_size.

- Add Documentation for DM integrity's status line.

- Switch DMDEBUG from a compile time config option to instead use
dynamic debug via pr_debug.

- Fix DM multipath target's hueristic for how it manages
"queue_if_no_path" state internally.

DM multipath now avoids disabling "queue_if_no_path" unless it is
actually needed (e.g. in response to configure timeout or explicit
"fail_if_no_path" message).

This fixes reports of spurious -EIO being reported back to userspace
application during fault tolerance testing with an NVMe backend.
Added various dynamic DMDEBUG messages to assist with debugging
queue_if_no_path in the future.

- Add a new DM multipath "Historical Service Time" Path Selector.

- Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.

- Improve DM writecache target performance by using explicit cache
flushing for target's single-threaded usecase and a small cleanup to
remove unnecessary test in persistent_memory_claim.

- Other small cleanups in DM core, dm-persistent-data, and DM
integrity.

* tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
dm crypt: avoid truncating the logical block size
dm mpath: add DM device name to Failing/Reinstating path log messages
dm mpath: enhance queue_if_no_path debugging
dm mpath: restrict queue_if_no_path state machine
dm mpath: simplify __must_push_back
dm zoned: check superblock location
dm zoned: prefer full zones for reclaim
dm zoned: select reclaim zone based on device index
dm zoned: allocate zone by device index
dm zoned: support arbitrary number of devices
dm zoned: move random and sequential zones into struct dmz_dev
dm zoned: per-device reclaim
dm zoned: add metadata pointer to struct dmz_dev
dm zoned: add device pointer to struct dm_zone
dm zoned: allocate temporary superblock for tertiary devices
dm zoned: convert to xarray
dm zoned: add a 'reserved' zone flag
dm zoned: improve logging messages for reclaim
dm zoned: avoid unnecessary device recalulation for secondary superblock
dm zoned: add debugging message for reading superblocks
...

Linus Torvalds
2020-06-06 06:45:03 +0800
64611a15c dm crypt: avoid truncating the logical block size ... Browse Code »

queue_limits::logical_block_size got changed from unsigned short to
unsigned int, but it was forgotten to update crypt_io_hints() to use the
new type. Fix it.

Fixes: ad6bf88a6c19 ("block: fix an integer overflow in logical block size")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers
Reviewed-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Eric Biggers
2020-06-06 02:59:59 +0800
04867370e dm mpath: add DM device name to Failing/Reinstating path log messages ... Browse Code »

When there are many DM multipath devices it really helps to have
additional context for which DM device a failed or reinstated path is
part of.

Signed-off-by: Mike Snitzer

Mike Snitzer
2020-06-06 02:59:58 +0800
4c3f48380 dm mpath: enhance queue_if_no_path debugging ... Browse Code »

Add more DMDEBUG that shows arguments passed and caller, and another
that shows state of related flags at end of queue_if_no_path().

Also add queue_if_no_path DMDEBUG to multipath_resume().

Signed-off-by: Mike Snitzer

Mike Snitzer
2020-06-06 02:59:57 +0800
553ec94cb dm mpath: restrict queue_if_no_path state machine ... Browse Code »

Do not allow saving disabled queue_if_no_path if already saved as
enabled; implies multiple suspends (which shouldn't ever happen). Log
if this unlikely scenario is ever triggered.

Also, only write MPATHF_SAVED_QUEUE_IF_NO_PATH during presuspend or if
"fail_if_no_path" message. MPATHF_SAVED_QUEUE_IF_NO_PATH is no longer
always modified, e.g.: even if queue_if_no_path()'s save_old_value
argument wasn't set. This just implies a bit tighter control over
the management of MPATHF_SAVED_QUEUE_IF_NO_PATH. Side-effect is
multipath_resume() doesn't reset MPATHF_QUEUE_IF_NO_PATH unless
MPATHF_SAVED_QUEUE_IF_NO_PATH was set (during presuspend); and at that
time the MPATHF_SAVED_QUEUE_IF_NO_PATH bit gets cleared. So
MPATHF_SAVED_QUEUE_IF_NO_PATH's use is much more narrow in scope.

Last, but not least, do _not_ disable queue_if_no_path during noflush
suspend. There is no need/benefit to saving off queue_if_no_path via
MPATHF_SAVED_QUEUE_IF_NO_PATH and clearing MPATHF_QUEUE_IF_NO_PATH for
noflush suspend -- by avoiding this needless queue_if_no_path flag
churn there is less potential for MPATHF_QUEUE_IF_NO_PATH to get lost.
Which avoids potential for IOs to be errored back up to userspace
during DM multipath's handling of path failures.

That said, this last change papers over a reported issue concerning
request-based dm-multipath's interaction with blk-mq, relative to
suspend and resume: multipath_endio is being called _before_
multipath_resume. This should never happen if DM suspend's
blk_mq_quiesce_queue() + dm_wait_for_completion() is genuinely waiting
for all inflight blk-mq requests to complete. Similarly:
drivers/md/dm.c:__dm_resume() clearly calls dm_table_resume_targets()
_before_ dm_start_queue()'s blk_mq_unquiesce_queue() is called. If
the queue isn't even restarted until after multipath_resume(); the BIG
question that still needs answering is: how can multipath_end_io beat
multipath_resume in a race!?

Signed-off-by: Mike Snitzer

Mike Snitzer
2020-06-06 02:59:56 +0800
a862e4e21 dm mpath: simplify __must_push_back ... Browse Code »

Remove micro-optimization that infers device is between presuspend and
resume (was done purely to avoid call to dm_noflush_suspending, which
isn't expensive anyway).

Remove flags argument since they are no longer checked.

And remove must_push_back_bio() since it was simply a call to
__must_push_back().

Signed-off-by: Mike Snitzer

Mike Snitzer
2020-06-06 02:59:55 +0800
27d49ac1d dm zoned: check superblock location ... Browse Code »

When specifying several devices the superblock location must be
checked to ensure the devices are specified in the correct order.

Signed-off-by: Hannes Reinecke
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:54 +0800
2094045fe dm zoned: prefer full zones for reclaim ... Browse Code »

Prefer full zones when selecting the next zone for reclaim.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:54 +0800
69875d443 dm zoned: select reclaim zone based on device index ... Browse Code »

per-device reclaim should select zones on that device only.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:53 +0800
22c1ef66c dm zoned: allocate zone by device index ... Browse Code »

When allocating a zone, pass in an indicator on which device the zone
should be allocated; this increases performance for a multi-device
setup because reclaim will now allocate zones on the device for which
reclaim is running.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:52 +0800
4dba12881 dm zoned: support arbitrary number of devices ... Browse Code »

Remove the hard-coded limit of two devices and support an unlimited
number of additional zoned devices.

Signed-off-by: Hannes Reinecke
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:51 +0800
bd82fdabf dm zoned: move random and sequential zones into struct dmz_dev ... Browse Code »

Random and sequential zones should be part of the respective
device structure to make arbitration between devices possible.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:50 +0800
f97809aec dm zoned: per-device reclaim ... Browse Code »

Instead of having one reclaim workqueue for the entire set we should
be allocating a reclaim workqueue per device; doing so will reduce
contention and should boost performance for a multi-device setup.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:50 +0800
18979819b dm zoned: add metadata pointer to struct dmz_dev ... Browse Code »

Add a metadata pointer within struct dmz_dev and use it as argument
for blkdev_report_zones() instead of the metadata itself.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:49 +0800
8f22272af dm zoned: add device pointer to struct dm_zone ... Browse Code »

Add a pointer, to the containing device, within struct dm_zone and
kill dmz_zone_to_dev().

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:48 +0800
5d2c74f3d dm zoned: allocate temporary superblock for tertiary devices ... Browse Code »

Checking the tertiary superblock just consists of validating UUIDs,
crcs, and the generation number; it doesn't have contents which would
be required during the actual operation.

So allocate a temporary superblock when checking tertiary devices to
avoid having to store it together with the 'real' superblocks.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:47 +0800
a92fbc446 dm zoned: convert to xarray ... Browse Code »

The zones array is getting really large, and large arrays tend to
wreak havoc with the CPU caches. So convert it to xarray to become
more cache friendly.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Colin Ian King # fix leak in dmz_insert
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:46 +0800
aec67b4ff dm zoned: add a 'reserved' zone flag ... Browse Code »

Instead of counting the number of reserved zones in dmz_free_zone(),
mark the zone as 'reserved' during allocation and simplify
dmz_free_zone().

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:46 +0800
c3ff479dd dm zoned: improve logging messages for reclaim ... Browse Code »

Instead of just reporting the errno, add some more verbose debugging
message in the reclaim path.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:45 +0800
1565929b8 dm zoned: avoid unnecessary device recalulation for secondary superblock ... Browse Code »

The secondary superblock must reside on the same device as the primary
superblock, so there is no need to re-calculate the device.

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:44 +0800
35d0c96e4 dm zoned: add debugging message for reading superblocks ... Browse Code »

Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-06-06 02:59:43 +0800
334b4fc17 dm ebs: use dm_bufio_forget_buffers ... Browse Code »

Use dm_bufio_forget_buffers instead of a block-by-block loop that
calls dm_bufio_forget. dm_bufio_forget_buffers is faster than the loop
because it searches for used buffers using rb-tree.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2020-06-06 02:59:42 +0800
33a180623 dm bufio: introduce forget_buffer_locked ... Browse Code »

Introduce a function forget_buffer_locked that forgets a range of
buffers. It is more efficient than calling forget_buffer in a loop.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2020-06-06 02:59:41 +0800
88f878e58 dm bufio: clean up rbtree block ordering ... Browse Code »

dm-bufio uses unnatural ordering in the rb-tree - blocks with smaller
numbers were put to the right node and blocks with bigger numbers were
put to the left node.

Reverse that logic so that it's natural.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2020-06-06 02:59:41 +0800

05 Jun, 2020

1 commit

a1c979f33 dm bufio: delete unused and inefficient dm_bufio_discard_buffers ... Browse Code »

There is no user for this interface. If in future it is needed it can
be reimplemented to walk the rbtree of buffers instead of doing
block-by-block lookups.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2020-06-05 08:57:38 +0800

03 Jun, 2020

4 commits

bce159d73 Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:

- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"

- Floppy code cleanup series (Willy, Denis)

- Floppy contention fix (Jiri)

- Loop CONFIGURE support (Martijn)

- bcache fixes/improvements (Coly, Joe, Colin)

- q->queuedata cleanups (Christoph)

- Get rid of ioctl_by_bdev (Christoph, Stefan)

- md/raid5 allocation fixes (Coly)

- zero length array fixes (Gustavo)

- swim3 task state fix (Xu)"

* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_ uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...

Linus Torvalds
2020-06-03 06:37:03 +0800
750a02ab8 Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:

- Remove dead blk-throttle and blk-wbt code (Guoqing)

- Include pid in blktrace note traces (Jan)

- Don't spew I/O errors on wouldblock termination (me)

- Zone append addition (Johannes, Keith, Damien)

- IO accounting improvements (Konstantin, Christoph)

- blk-mq hardware map update improvements (Ming)

- Scheduler dispatch improvement (Salman)

- Inline block encryption support (Satya)

- Request map fixes and improvements (Weiping)

- blk-iocost tweaks (Tejun)

- Fix for timeout failing with error injection (Keith)

- Queue re-run fixes (Douglas)

- CPU hotplug improvements (Christoph)

- Queue entry/exit improvements (Christoph)

- Move DMA drain handling to the few drivers that use it (Christoph)

- Partition handling cleanups (Christoph)"

* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...

Linus Torvalds
2020-06-03 06:29:19 +0800
88dca4ca5 mm: remove the pgprot argument to __vmalloc ... Browse Code »

The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Reviewed-by: Michael Kelley [hyperv]
Acked-by: Gao Xiang [erofs]
Acked-by: Peter Zijlstra (Intel)
Acked-by: Wei Liu
Cc: Christian Borntraeger
Cc: Christophe Leroy
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Johannes Weiner
Cc: "K. Y. Srinivasan"
Cc: Laura Abbott
Cc: Mark Rutland
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Robin Murphy
Cc: Sakari Ailus
Cc: Stephen Hemminger
Cc: Sumit Semwal
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: Paul Mackerras
Cc: Vasily Gorbik
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-03 01:59:11 +0800
db2c1d86c md: remove __clear_page_buffers and use attach/detach_page_private ... Browse Code »

After introduction attach/detach_page_private in pagemap.h, we can remove
the duplicated code and call the new functions.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Acked-by: Song Liu
Link: http://lkml.kernel.org/r/20200517214718.468-3-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:07 +0800

30 May, 2020

1 commit

bf0beec06 blk-mq: drain I/O when all CPUs in a hctx are offline ... Browse Code »

Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"That was the constraint of managed interrupts from the very beginning:

The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again."

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.

Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests. This guarantees that there is no inflight I/O before shutting
down the managed IRQ.

Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

[hch: different retry mechanism, merged two patches, minor cleanups]

Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

Ming Lei
2020-05-30 00:23:25 +0800

27 May, 2020

7 commits

86240d5b6 dm: use bio_{start,end}_io_acct ... Browse Code »

Switch dm to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-27 19:21:23 +0800
85750aeb7 bcache: use bio_{start,end}_io_acct ... Browse Code »

Switch bcache to use the nicer bio accounting helpers, and call the
routines where we also sample the start time to give coherent accounting
results.

Signed-off-by: Christoph Hellwig
Reviewed-by: Konstantin Khlebnikov
Acked-by: Coly Li
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-27 19:21:23 +0800
0c8d3fcea bcache: configure the asynchronous registertion to be experimental ... Browse Code »

In order to avoid the experimental async registration interface to
be treated as new kernel ABI for common users, this patch makes it
as an experimental kernel configure BCACHE_ASYNC_REGISTRAION.

This interface is for extreme large cached data situation, to make sure
the bcache device can always created without the udev timeout issue. For
normal users the async or sync registration does not make difference.

In future when we decide to use the asynchronous registration as default
behavior, this experimental interface may be removed.

Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Coly Li
2020-05-27 19:19:36 +0800
9e23ccf8f bcache: asynchronous devices registration ... Browse Code »

When there is a lot of data cached on cache device, the bcach internal
btree can take a very long to validate during the backing device and
cache device registration. In my test, it may takes 55+ minutes to check
all the internal btree nodes.

The problem is that the registration is invoked by udev rules and the
udevd has 180 seconds timeout by default. If the btree node checking
time is longer than udevd timeout, the registering process will be
killed by udevd with SIGKILL. If the registering process has pending
sigal, creating kthread for bcache will fail and the device registration
will fail. The result is, for bcache device which cached a lot of data
on cache device, the bcache device node like /dev/bcache won't create
always due to the very long btree checking time.

A solution to avoid the udevd 180 seconds timeout is to register devices
in an asynchronous way. Which is, after writing cache or backing device
path into /sys/fs/bcache/register_async, the kernel code will create a
kworker and move all the btree node checking (for cache device) or dirty
data counting (for cached device) in the kwork context. Then the kworder
is scheduled on system_wq and the registration code just returned to
user space udev rule task. By this asynchronous way, the udev task for
bcache rule will complete in seconds, no matter how long time spent in
the kworker context, it won't be killed by udevd for a timeout.

After all the checking and counting are done asynchronously in the
kworker, the bcache device will eventually be created successfully.

This patch does the above chagne and add a register sysfs file
/sys/fs/bcache/register_async. Writing the registering device path into
this sysfs file will do the asynchronous registration.

The register_async interface is for very rare condition and won't be
used for common users. In future I plan to make the asynchronous
registration as default behavior, which depends on feedback for this
patch.

Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Coly Li
2020-05-27 19:19:36 +0800
86da9f736 bcache: fix refcount underflow in bcache_device_free() ... Browse Code »

The problematic code piece in bcache_device_free() is,

785 static void bcache_device_free(struct bcache_device *d)
786 {
787 struct gendisk *disk = d->disk;
[snipped]
799 if (disk) {
800 if (disk->flags & GENHD_FL_UP)
801 del_gendisk(disk);
802
803 if (disk->queue)
804 blk_cleanup_queue(disk->queue);
805
806 ida_simple_remove(&bcache_device_idx,
807 first_minor_to_idx(disk->first_minor));
808 put_disk(disk);
809 }
[snipped]
816 }

At line 808, put_disk(disk) may encounter kobject refcount of 'disk'
being underflow.

Here is how to reproduce the issue,
- Attche the backing device to a cache device and do random write to
make the cache being dirty.
- Stop the bcache device while the cache device has dirty data of the
backing device.
- Only register the backing device back, NOT register cache device.
- The bcache device node /dev/bcache0 won't show up, because backing
device waits for the cache device shows up for the missing dirty
data.
- Now echo 1 into /sys/fs/bcache/pendings_cleanup, to stop the pending
backing device.
- After the pending backing device stopped, use 'dmesg' to check kernel
message, a use-after-free warning from KASA reported the refcount of
kobject linked to the 'disk' is underflow.

The dropping refcount at line 808 in the above code piece is added by
add_disk(d->disk) in bch_cached_dev_run(). But in the above condition
the cache device is not registered, bch_cached_dev_run() has no chance
to be called and the refcount is not added. The put_disk() for a non-
added refcount of gendisk kobject triggers a underflow warning.

This patch checks whether GENHD_FL_UP is set in disk->flags, if it is
not set then the bcache device was not added, don't call put_disk()
and the the underflow issue can be avoided.

Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Coly Li
2020-05-27 19:19:36 +0800
46f5aa880 bcache: Convert pr_<level> uses to a more typical style ... Browse Code »

Remove the trailing newline from the define of pr_fmt and add newlines
to the uses.

Miscellanea:

o Convert bch_bkey_dump from multiple uses of pr_err to pr_cont
as the earlier conversion was inappropriate done causing multiple
lines to be emitted where only a single output line was desired
o Use vsprintf extension %pV in bch_cache_set_error to avoid multiple
line output where only a single line output was desired
o Coalesce formats

Fixes: 6ae63e3501c4 ("bcache: replace printk() by pr_*() routines")

Signed-off-by: Joe Perches
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Joe Perches
2020-05-27 19:19:36 +0800
3b5b7b1f7 bcache: remove redundant variables i and n ... Browse Code »

Variables i and n are being assigned but are never used. They are
redundant and can be removed.

Signed-off-by: Colin Ian King
Signed-off-by: Coly Li
Addresses-Coverity: ("Unused value")
Signed-off-by: Jens Axboe

Colin Ian King
2020-05-27 19:19:36 +0800

23 May, 2020

1 commit

b4756d43a dm zoned: remove leftover hunk for switching to sequential zones ... Browse Code »

Remove a leftover hunk to switch from random zones to sequential
zones when selecting a reclaim zone; the logic has moved into the
caller and this hunk is now pointless.

Fixes: 34f5affd04c4 ("dm zoned: separate random and cache zones")
Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer

Hannes Reinecke
2020-05-23 00:07:14 +0800

22 May, 2020

1 commit

9398554fb block: remove the error_sector argument to blkdev_issue_flush ... Browse Code »

The argument isn't used by any caller, and drivers don't fill out
bi_sector for flush requests either.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-22 22:45:46 +0800