11 Feb, 2019
3 commits
-
This patch fixes a race condition where a write is mapped to the last
sectors of a line. The write is synced to the device but the L2P is not
updated yet. When the line is garbage collected before the L2P update
is performed, the sectors are ignored by the GC logic and the line is
freed before all sectors are moved. When the L2P is finally updated, it
contains a mapping to a freed line, subsequent reads of the
corresponding LBAs fail.This patch introduces a per line counter specifying the number of
sectors that are synced to the device but have not been updated in the
L2P. Lines with a counter of greater than zero will not be selected
for GC.Signed-off-by: Heiner Litz
Reviewed-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
There are new types and helpers that are supposed to be used in new code.
As a preparation to get rid of legacy types and API functions do
the conversion here.Signed-off-by: Andy Shevchenko
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
As chunk metadata is allocated using vmalloc, we need to free it
using vfree.Fixes: 090ee26fd512 ("lightnvm: use internal allocation for chunk log page")
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe
12 Dec, 2018
6 commits
-
pblk performs recovery of open lines by storing the LBA in the per LBA
metadata field. Recovery therefore only works for drives that has this
field.This patch adds support for packed metadata, which store l2p mapping
for open lines in last sector of every write unit and enables drives
without per IO metadata to recover open lines.After this patch, drives with OOB size
Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently lightnvm and pblk uses single DMA pool, for which the entry
size always is equal to PAGE_SIZE. The contents of each entry allocated
from the DMA pool consists of a PPA list (8bytes * 64), leaving
56bytes * 64 space for metadata. Since the metadata field can be bigger,
such as 128 bytes, the static size does not cover this use-case.This patch adds support for I/O metadata above 56 bytes by changing DMA
pool size based on device meta size and allows pblk to use OOB metadata
>=16B.Reviewed-by: Javier González
Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
pblk currently assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive in the future.After this patch only OOB metadata equal to 16 bytes is supported.
Reviewed-by: Javier González
Signed-off-by: Igor Konopko
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
pblk's recovery path is single threaded and therefore a number of
assumptions regarding concurrency can be made. To avoid confusion, make
this explicit with a couple of comments in the code.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Protect the list_add on the pblk_line_init_bb() error
path in case this code is used for some other purpose
in the future.Signed-off-by: Hua Su
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
The check for chunk closes suffers from an off-by-one issue, leading
to chunk close events not being traced.Fixes: 4c44abf43d00 ("lightnvm: pblk: add trace events for chunk states")
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe
09 Oct, 2018
19 commits
-
pblk exposes a sysfs interface that represents its internal state. Part
of this state is the map bitmap for the current open line, which should
be protected by the line lock to avoid a race when freeing the line
metadata. Currently, it is not.This patch makes sure that the line state is consistent and NULL
bitmap pointers are not dereferenced.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Add GLP-2.0 SPDX license tag to all pblk files
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
pblk guarantees write ordering at a chunk level through a per open chunk
semaphore. At this point, since we only have an open I/O stream for both
user and GC data, the semaphore is per parallel unit.For the metadata I/O that is synchronous, the semaphore is not needed as
ordering is guaranteed. However, if the metadata scheme changes or
multiple streams are open, this guarantee might not be preserved.This patch makes sure that all writes go through the semaphore, even for
synchronous I/O. This is consistent with pblk's write I/O model. It also
simplifies maintenance since changes in the metadata scheme could cause
ordering issues.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
pblk maintains two different metadata paths for smeta and emeta, which
store metadata at the start of the line and at the end of the line,
respectively. Until now, these path has been common for writing and
retrieving metadata, however, as these paths diverge, the common code
becomes less clear and unnecessary complicated.In preparation for further changes to the metadata write path, this
patch separates the write and read paths for smeta and emeta and
removes the synchronous emeta path as it not used anymore (emeta is
scheduled asynchronously to prevent jittering due to internal I/Os).Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
dma allocations for ppa_list and meta_list in rqd are replicated in
several places across the pblk codebase. Make helpers to encapsulate
creation and deletion to simplify the code.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
The lightnvm subsystem provides helpers to retrieve chunk metadata,
where the target needs to provide a buffer to store the metadata. An
implicit assumption is that this buffer is contiguous and can be used to
retrieve the data from the device. If the device exposes too many
chunks, then kmalloc might fail, thus failing instance creation.This patch removes this assumption by implementing an internal buffer in
the lightnvm subsystem to retrieve chunk metadata. Targets can then
use virtual memory allocations. Since this is a target API change, adapt
pblk accordingly.Signed-off-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Trace state of chunk resets.
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Add trace events for tracking pblk state changes.
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Add trace events for logging for line state changes.
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Introduce trace points for tracking chunk states in pblk - this is
useful for inspection of the entire state of the drive, and real handy
for both fw and pblk debugging.Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Remove the debug only iteration within __pblk_down_page, which
then allows us to reduce the number of arguments down to pblk and
the parallel unit from the functions that calls it. Simplifying the
callers logic considerably.Also, rename the functions pblk_[down/up]_page to
pblk_[down/up]_chunk, to communicate that it manages the write
pointer of the chunk. Note that it also protects the parallel unit
such that at most one chunk is active per parallel unit.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe -
The parameters nr_ppas and ppa_list are not used, so remove them.
Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Line map bitmap allocations are fairly large and can fail. Allocation
failures are fatal to pblk, stopping the write pipeline. To avoid this,
allocate the bitmaps using a mempool instead.Mempool allocations never fail if called from a process context,
and pblk *should* only allocate map bitmaps in process context,
but keep the failure handling for robustness sake.Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
If a line is recovered from open chunks, the memory structures for
emeta have not necessarily been properly set on line initialization.
When closing a line, make sure that emeta is consistent so that the line
can be recovered on the fast path on next reboot.Also, remove a couple of empty lines at the end of the function.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
The current helper to obtain a line from a ppa returns the line id,
which requires its users to explicitly retrieve the pointer to the line
with the id.Make 2 different helpers: one returning the line id and one returning
the line directly.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
The read completion path uses the put_line variable to decide whether
the reference on a line should be released. The function name used for
that is pblk_read_put_rqd_kref, which could lead one to believe that it
is the rqd that is releasing the reference, while it is the line
reference that is put.Rename and also split the function in two to account for either rqd or
single ppa callers and move it to core, such that it later can be used
in the write path as well.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Reviewed-by: Heiner Litz
Signed-off-by: Jens Axboe -
pblk implements two data paths for recovery line state. One for 1.2
and another for 2.0, instead of having pblk implement these, combine
them in the core to reduce complexity and make available to other
targets.The new interface will adhere to the 2.0 chunk definition,
including managing open chunks with an active write pointer. To provide
this interface, a 1.2 device recovers the state of the chunks by
manually detecting if a chunk is either free/open/close/offline, and if
open, scanning the flash pages sequentially to find the next writeable
page. This process takes on average ~10 seconds on a device with 64 dies,
1024 blocks and 60us read access time. The process can be parallelized
but is left out for maintenance simplicity, as the 1.2 specification is
deprecated. For 2.0 devices, the logic is maintained internally in the
drive and retrieved through the 2.0 interface.Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
rqd.error is masked by the return value of pblk_submit_io_sync.
The rqd structure is then passed on to the end_io function, which
assumes that any error should lead to a chunk being marked
offline/bad. Since the pblk_submit_io_sync can fail before the
command is issued to the device, the error value maybe not correspond
to a media failure, leading to chunks being immaturely retired.Also, the pblk_blk_erase_sync function prints an error message in case
the erase fails. Since the caller prints an error message by itself,
remove the error message in this function.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Reviewed-by: Hans Holmberg
Signed-off-by: Jens Axboe -
Add nvm_set_flags helper to enable core to appropriately
set the command flags for read/write/erase depending on which version
a drive supports.The flags arguments can be distilled into the access hint,
scrambling, and program/erase suspend. Replace the access hint with
a "is_seq" parameter. The rest of the flags are dependent on the
command opcode, which is trivial to detect and set.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe
13 Jul, 2018
3 commits
-
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
The error messages in pblk does not say which pblk instance that
a message occurred from. Update each error message to reflect the
instance it belongs to, and also prefix it with pblk, so we know
the message comes from the pblk module.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe -
There is no users of CONFIG_NVM_DEBUG in the LightNVM subsystem. All
users are in pblk. Rename NVM_DEBUG to NVM_PBLK_DEBUG and enable
only for pblk.Also fix up the CONFIG_NVM_PBLK entry to follow the code style for
Kconfig files.Signed-off-by: Matias Bjørling
Reviewed-by: Javier González
Signed-off-by: Jens Axboe
01 Jun, 2018
9 commits
-
pblk allocates line bitmaps within the line lock unnecessarily. In order
to take pressure out of the fast patch, allocate line bitmaps outside
of this lock and refactor accordingly.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Unless we kick the writer directly when setting a new flush point, the
user risks having to wait for up to one second (the default timeout for
the write thread to be kicked) for the IO to complete.Signed-off-by: Hans Holmberg
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Currently in case of error caused by bio_pc_add_page in
pblk_bio_add_pages two issues occur when calling from
pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
are trying to free pages not allocated from our mempool. Second one
is the warn from dma_pool_free, that we are trying to free NULL
pointer dma.This commit fix both issues.
Signed-off-by: Igor Konopko
Signed-off-by: Marcin Dziegielewski
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Smeta write errors were previously ignored. Skip these
lines instead and throw them back on the free
list, so the chunks will go through a reset cycle
before we attempt to use the line again.Signed-off-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Write failures should not happen under normal circumstances,
so in order to bring the chunk back into a known state as soon
as possible, evacuate all the valid data out of the line and let the
fw judge if the block can be written to in the next reset cycle.Do this by introducing a new gc list for lines with failed writes,
and ensure that the rate limiter allocates a small portion of
the write bandwidth to get the job done.The lba list is saved in memory for use during gc as we
cannot gurantee that the emeta data is readable if a write
error occurred.Signed-off-by: Hans Holmberg
Reviewed-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Remove dead function for manual sync. I/O
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
If the namespace is unregistered before the LightNVM target is removed
(e.g., on hot unplug) it is too late for the target to store any metadata
on the device - any attempt to write to the device will fail. In this
case, pass on a "gracefull teardown" flag to the target to let it know
when this happens.In the case of pblk, we pad the open line (close all open chunks) to
improve data retention. In the event of an ungraceful shutdown, avoid
this part and just clean up.Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Remove unnecessary argument on pblk_line_free()
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe -
Return a meaningful error when the sanity vector I/O check fails.
Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe