Eric Lee / smarc-fsl-linux-kernel

09 Oct, 2018

33 commits

45dcf29b9 lightnvm: pblk: encapsulate rqd dma allocations ... Browse Code »

dma allocations for ppa_list and meta_list in rqd are replicated in
several places across the pblk codebase. Make helpers to encapsulate
creation and deletion to simplify the code.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:07 +0800
090ee26fd lightnvm: use internal allocation for chunk log page ... Browse Code »

The lightnvm subsystem provides helpers to retrieve chunk metadata,
where the target needs to provide a buffer to store the metadata. An
implicit assumption is that this buffer is contiguous and can be used to
retrieve the data from the device. If the device exposes too many
chunks, then kmalloc might fail, thus failing instance creation.

This patch removes this assumption by implementing an internal buffer in
the lightnvm subsystem to retrieve chunk metadata. Targets can then
use virtual memory allocations. Since this is a target API change, adapt
pblk accordingly.

Signed-off-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:07 +0800
7325b4bbe lightnvm: pblk: fix two sleep-in-atomic-context bugs ... Browse Code »

The driver may sleep with holding a spinlock.

The function call paths (from bottom to top) in Linux-4.16 are:

[FUNC] nvm_dev_dma_alloc(GFP_KERNEL)
drivers/lightnvm/pblk-core.c, 754:
nvm_dev_dma_alloc in pblk_line_submit_smeta_io
drivers/lightnvm/pblk-core.c, 1048:
pblk_line_submit_smeta_io in pblk_line_init_bb
drivers/lightnvm/pblk-core.c, 1434:
pblk_line_init_bb in pblk_line_replace_data
drivers/lightnvm/pblk-recovery.c, 980:
pblk_line_replace_data in pblk_recov_l2p
drivers/lightnvm/pblk-recovery.c, 976:
spin_lock in pblk_recov_l2p

[FUNC] bio_map_kern(GFP_KERNEL)
drivers/lightnvm/pblk-core.c, 762:
bio_map_kern in pblk_line_submit_smeta_io
drivers/lightnvm/pblk-core.c, 1048:
pblk_line_submit_smeta_io in pblk_line_init_bb
drivers/lightnvm/pblk-core.c, 1434:
pblk_line_init_bb in pblk_line_replace_data
drivers/lightnvm/pblk-recovery.c, 980:
pblk_line_replace_data in pblk_recov_l2p
drivers/lightnvm/pblk-recovery.c, 976:
spin_lock in pblk_recov_l2p

To fix these bugs, the call to pblk_line_replace_data()
is moved out of the spinlock protection.

These bugs are found by my static analysis tool DSAC.

Signed-off-by: Jia-Ju Bai
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Jia-Ju Bai
2018-10-09 22:25:07 +0800
bf82fa2f5 lightnvm: pblk: fix mapping issue on failed writes ... Browse Code »

On 1.2-devices, the mapping-out of remaning sectors in the
failed-write's block can result in an infinite loop,
stalling the write pipeline, fix this.

Fixes: 6a3abf5beef6 ("lightnvm: pblk: rework write error recovery path")
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
1864de94e lightnvm: pblk: stop recreating global caches ... Browse Code »

Pblk should not create a set of global caches every time
a pblk instance is created. The global caches should be
made available only when there is one or more pblk instances.

This patch bundles the global caches together with a kref
keeping track of whether the caches should be available or not.

Also, turn the global pblk lock into a mutex that explicitly
protects the caches (as this was the only purpose of the lock).

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
63dee3a6c lightnvm: pblk: calculate line pad distance in helper ... Browse Code »

If a line is padded, calculate the pad distance directly on the helper
being used for this purpose.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:07 +0800
7f985f9a6 lightnvm: move ppa transformations to core ... Browse Code »

Continuing the effort of moving 1.2 and 2.0 specific code to core, move
64_to_32 and 32_to_64 ppa helpers from pblk to core.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:07 +0800
4209c31c0 lightnvm: pblk: add tracing for chunk resets ... Browse Code »

Trace state of chunk resets.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
1b0dd0bf3 lightnvm: pblk: add trace events for pblk state changes ... Browse Code »

Add trace events for tracking pblk state changes.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
f29372322 lightnvm: pblk: add trace events for line state changes ... Browse Code »

Add trace events for logging for line state changes.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
4c44abf43 lightnvm: pblk: add trace events for chunk states ... Browse Code »

Introduce trace points for tracking chunk states in pblk - this is
useful for inspection of the entire state of the drive, and real handy
for both fw and pblk debugging.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
43241cfe4 lightnvm: pblk: remove debug from pblk_[down/up]_page ... Browse Code »

Remove the debug only iteration within __pblk_down_page, which
then allows us to reduce the number of arguments down to pblk and
the parallel unit from the functions that calls it. Simplifying the
callers logic considerably.

Also, rename the functions pblk_[down/up]_page to
pblk_[down/up]_chunk, to communicate that it manages the write
pointer of the chunk. Note that it also protects the parallel unit
such that at most one chunk is active per parallel unit.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:07 +0800
765462fa4 lightnvm: pblk: fix write amplificiation calculation ... Browse Code »

When the user data counter exceeds 32 bits, the write amplification
calculation does not provide the right value. Fix this by using
div64_u64 in stead of div64.

Fixes: 76758390f83e ("lightnvm: pblk: export write amplification counters to sysfs")
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
ea1d24bc3 lightnvm: pblk: fix up prints in pblk_read_check_rand ... Browse Code »

The prefix when printing ppas in pblk_read_check_rand should be "rnd"
not "seq", so fix this so we can differentiate between lba missmatches
in random and sequential reads. Also change the print order so
we align with pblk_read_check_seq, printing read lba first.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
e99e802fc lightnvm: pblk: remove unused parameters in pblk_up_rq ... Browse Code »

The parameters nr_ppas and ppa_list are not used, so remove them.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
53d82db69 lightnvm: pblk: allocate line map bitmaps using a mempool ... Browse Code »

Line map bitmap allocations are fairly large and can fail. Allocation
failures are fatal to pblk, stopping the write pipeline. To avoid this,
allocate the bitmaps using a mempool instead.

Mempool allocations never fail if called from a process context,
and pblk *should* only allocate map bitmaps in process context,
but keep the failure handling for robustness sake.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
d68a93440 lightnvm: introduce nvm_rq_to_ppa_list ... Browse Code »

There is a number of places in the lightnvm subsystem where the user
iterates over the ppa list. Before iterating, the user must know if it
is a single or multiple LBAs due to vector commands using either the
nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which
leads to open-coding the if/else statement.

Instead of having multiple if/else's, move it into a function that can
be called by its users.

A nice side effect of this cleanup is that this patch fixes up a
bunch of cases where we don't consider the single-ppa case in pblk.

Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Hans Holmberg
2018-10-09 22:25:07 +0800
9cc85bc76 lightnvm: pblk: guarantee emeta on line close ... Browse Code »

If a line is recovered from open chunks, the memory structures for
emeta have not necessarily been properly set on line initialization.
When closing a line, make sure that emeta is consistent so that the line
can be recovered on the fast path on next reboot.

Also, remove a couple of empty lines at the end of the function.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:06 +0800
7a7d6f9b4 lightnvm: pblk: remove unused variable. ... Browse Code »

Removed unused struct ppa_addr variable.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:06 +0800
2e696f909 lightnvm: pblk: fix comment typo ... Browse Code »

Fix comment typo Decrese -> Decrease

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:06 +0800
cb21665c8 lightnvm: pblk: improve line helpers ... Browse Code »

The current helper to obtain a line from a ppa returns the line id,
which requires its users to explicitly retrieve the pointer to the line
with the id.

Make 2 different helpers: one returning the line id and one returning
the line directly.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:06 +0800
2cf99bbd1 lightnvm: pblk: add helpers for chunk addresses ... Browse Code »

Implement helpers to go from ppas to a chunk within a line and an
address within a chunk.

These helpers will be used on the patches adding trace support in pblk,
which will be sent in this window.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:06 +0800
ae14cc044 lightnvm: pblk: refactor put line fn on read completion ... Browse Code »

The read completion path uses the put_line variable to decide whether
the reference on a line should be released. The function name used for
that is pblk_read_put_rqd_kref, which could lead one to believe that it
is the rqd that is releasing the reference, while it is the line
reference that is put.

Rename and also split the function in two to account for either rqd or
single ppa callers and move it to core, such that it later can be used
in the write path as well.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Reviewed-by: Heiner Litz
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
d20be90ae lightnvm: pblk: remove size and out of bounds read check ... Browse Code »

The I/O size and capacity checks are already done by the block layer.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
8bbd45d02 lightnvm: pblk: fix incorrect min_write_pgs ... Browse Code »

The calculation of pblk->min_write_pgs should only use the optimal
write size attribute provided by the drive, it does not correlate to
the memory page size of the system, which can be smaller or larger
than the LBA size reported.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
afdc23c91 lightnvm: pblk: unify vector max req constants ... Browse Code »

Both NVM_MAX_VLBA and PBLK_MAX_REQ_ADDRS define how many LBAs that
are available in a vector command. pblk uses them interchangeably
in its implementation. Use NVM_MAX_VLBA as the main one and remove
usages of PBLK_MAX_REQ_ADDRS.

Also remove the power representation that only has one user, and
instead calculate it at runtime.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
aff3fb18f lightnvm: move bad block and chunk state logic to core ... Browse Code »

pblk implements two data paths for recovery line state. One for 1.2
and another for 2.0, instead of having pblk implement these, combine
them in the core to reduce complexity and make available to other
targets.

The new interface will adhere to the 2.0 chunk definition,
including managing open chunks with an active write pointer. To provide
this interface, a 1.2 device recovers the state of the chunks by
manually detecting if a chunk is either free/open/close/offline, and if
open, scanning the flash pages sequentially to find the next writeable
page. This process takes on average ~10 seconds on a device with 64 dies,
1024 blocks and 60us read access time. The process can be parallelized
but is left out for maintenance simplicity, as the 1.2 specification is
deprecated. For 2.0 devices, the logic is maintained internally in the
drive and retrieved through the 2.0 interface.

Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
d8adaa3b8 lightnvm: pblk: fix race condition on metadata I/O ... Browse Code »

In pblk, when a new line is allocated, metadata for the previously
written line is scheduled. This is done through a fixed memory region
that is shared through time and contexts across different lines and
therefore protected by a lock. Unfortunately, this lock is not properly
covering all the metadata used for sharing this memory regions,
resulting in a race condition.

This patch fixes this race condition by protecting this metadata
properly.

Fixes: dd2a43437337 ("lightnvm: pblk: sched. metadata on write thread")
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2018-10-09 22:25:06 +0800
656e33ca3 lightnvm: move device L2P detection to core ... Browse Code »

A 1.2 device is able to manage the logical to physical mapping
table internally or leave it to the host.

A target only supports one of those approaches, and therefore must
check on initialization. Move this check to core to avoid each target
implement the check.

Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
4b5d56edb lightnvm: pblk: fix rqd.error return value in pblk_blk_erase_sync ... Browse Code »

rqd.error is masked by the return value of pblk_submit_io_sync.
The rqd structure is then passed on to the end_io function, which
assumes that any error should lead to a chunk being marked
offline/bad. Since the pblk_submit_io_sync can fail before the
command is issued to the device, the error value maybe not correspond
to a media failure, leading to chunks being immaturely retired.

Also, the pblk_blk_erase_sync function prints an error message in case
the erase fails. Since the caller prints an error message by itself,
remove the error message in this function.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:06 +0800
d7b680167 lightnvm: combine 1.2 and 2.0 command flags ... Browse Code »

Add nvm_set_flags helper to enable core to appropriately
set the command flags for read/write/erase depending on which version
a drive supports.

The flags arguments can be distilled into the access hint,
scrambling, and program/erase suspend. Replace the access hint with
a "is_seq" parameter. The rest of the flags are dependent on the
command opcode, which is trivial to detect and set.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:05 +0800
73569e110 lightnvm: remove dependencies on BLK_DEV_NVME and PCI ... Browse Code »

No need to force NVMe device driver to be compiled in if the
lightnvm subsystem is selected. Also no need for PCI to be selected
as well, as it would be selected by the device driver that hooks into
the subsystem.

Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe

Matias Bjørling
2018-10-09 22:25:05 +0800
36e765392 blk-mq: complete req in softirq context in case of single queue ... Browse Code »

Lot of controllers may have only one irq vector for completing IO
request. And usually affinity of the only irq vector is all possible
CPUs, however, on most of ARCH, there may be only one specific CPU
for handling this interrupt.

So if all IOs are completed in hardirq context, it is inevitable to
degrade IO performance because of increased irq latency.

This patch tries to address this issue by allowing to complete request
in softirq context, like the legacy IO path.

IOPS is observed as ~13%+ in the following randread test on raid0 over
virtio-scsi.

mdadm --create --verbose /dev/md0 --level=0 --chunk=1024 --raid-devices=8 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

fio --time_based --name=benchmark --runtime=30 --filename=/dev/md0 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=32 --rw=randread --blocksize=4k

Cc: Dongli Zhang
Cc: Zach Marano
Cc: Christoph Hellwig
Cc: Bart Van Assche
Cc: Jianchao Wang
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-10-09 00:50:43 +0800

08 Oct, 2018

7 commits

3a646fd77 bcache: panic fix for making cache device ... Browse Code »

when the nbuckets of cache device is smaller than 1024, making cache
device will trigger BUG_ON in kernel, add a condition to avoid this.

Reported-by: nitroxis
Signed-off-by: Dongbo Cao
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Dongbo Cao
2018-10-08 22:19:59 +0800
f6027bca9 bcache: split combined if-condition code into separate ones ... Browse Code »

Split the combined '||' statements in if() check, to make the code easier
for debug.

Signed-off-by: Dongbo Cao
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Dongbo Cao
2018-10-08 22:19:57 +0800
8792099f9 bcache: use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set ... Browse Code »

Current cache_set has MAX_CACHES_PER_SET caches most, and the macro
is used for
"
struct cache *cache_by_alloc[MAX_CACHES_PER_SET];
"
in the define of struct cache_set.

Use MAX_CACHES_PER_SET instead of magic number 8 in
__bch_bucket_alloc_set.

Signed-off-by: Shenghui Wang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Shenghui Wang
2018-10-08 22:19:56 +0800
149d0efad bcache: replace hard coded number with BUCKET_GC_GEN_MAX ... Browse Code »

In extents.c:bch_extent_bad(), number 96 is used as parameter to call
btree_bug_on(). The purpose is to check whether stale gen value exceeds
BUCKET_GC_GEN_MAX, so it is better to use macro BUCKET_GC_GEN_MAX to
make the code more understandable.

Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Coly Li
2018-10-08 22:19:55 +0800
91bafdf08 bcache: remove useless parameter of bch_debug_init() ... Browse Code »

Parameter "struct kobject *kobj" in bch_debug_init() is useless,
remove it in this patch.

Signed-off-by: Dongbo Cao
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Dongbo Cao
2018-10-08 22:19:53 +0800
3fd3c5c02 bcache: remove unused bch_passthrough_cache ... Browse Code »

struct kmem_cache *bch_passthrough_cache is not used in
bcache code. Remove it.

Signed-off-by: Shenghui Wang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Shenghui Wang
2018-10-08 22:19:52 +0800
46010141d bcache: recal cached_dev_sectors on detach ... Browse Code »

Recal cached_dev_sectors on cached_dev detached, as recal done on
cached_dev attached.

Update the cached_dev_sectors before bcache_device_detach called
as bcache_device_detach will set bcache_device->c to NULL.

Signed-off-by: Shenghui Wang
Signed-off-by: Coly Li
Signed-off-by: Jens Axboe

Shenghui Wang
2018-10-08 22:19:50 +0800