11 Feb, 2019

3 commits

  • This patch fixes a race condition where a write is mapped to the last
    sectors of a line. The write is synced to the device but the L2P is not
    updated yet. When the line is garbage collected before the L2P update
    is performed, the sectors are ignored by the GC logic and the line is
    freed before all sectors are moved. When the L2P is finally updated, it
    contains a mapping to a freed line, subsequent reads of the
    corresponding LBAs fail.

    This patch introduces a per line counter specifying the number of
    sectors that are synced to the device but have not been updated in the
    L2P. Lines with a counter of greater than zero will not be selected
    for GC.

    Signed-off-by: Heiner Litz
    Reviewed-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Heiner Litz
     
  • There are new types and helpers that are supposed to be used in new code.

    As a preparation to get rid of legacy types and API functions do
    the conversion here.

    Signed-off-by: Andy Shevchenko
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Andy Shevchenko
     
  • As chunk metadata is allocated using vmalloc, we need to free it
    using vfree.

    Fixes: 090ee26fd512 ("lightnvm: use internal allocation for chunk log page")
    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     

12 Dec, 2018

6 commits

  • pblk performs recovery of open lines by storing the LBA in the per LBA
    metadata field. Recovery therefore only works for drives that has this
    field.

    This patch adds support for packed metadata, which store l2p mapping
    for open lines in last sector of every write unit and enables drives
    without per IO metadata to recover open lines.

    After this patch, drives with OOB size
    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently lightnvm and pblk uses single DMA pool, for which the entry
    size always is equal to PAGE_SIZE. The contents of each entry allocated
    from the DMA pool consists of a PPA list (8bytes * 64), leaving
    56bytes * 64 space for metadata. Since the metadata field can be bigger,
    such as 128 bytes, the static size does not cover this use-case.

    This patch adds support for I/O metadata above 56 bytes by changing DMA
    pool size based on device meta size and allows pblk to use OOB metadata
    >=16B.

    Reviewed-by: Javier González
    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • pblk currently assumes that size of OOB metadata on drive is always
    equal to size of pblk_sec_meta struct. This commit add helpers which will
    allow to handle different sizes of OOB metadata on drive in the future.

    After this patch only OOB metadata equal to 16 bytes is supported.

    Reviewed-by: Javier González
    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • pblk's recovery path is single threaded and therefore a number of
    assumptions regarding concurrency can be made. To avoid confusion, make
    this explicit with a couple of comments in the code.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Protect the list_add on the pblk_line_init_bb() error
    path in case this code is used for some other purpose
    in the future.

    Signed-off-by: Hua Su
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hua Su
     
  • The check for chunk closes suffers from an off-by-one issue, leading
    to chunk close events not being traced.

    Fixes: 4c44abf43d00 ("lightnvm: pblk: add trace events for chunk states")
    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     

09 Oct, 2018

19 commits

  • pblk exposes a sysfs interface that represents its internal state. Part
    of this state is the map bitmap for the current open line, which should
    be protected by the line lock to avoid a race when freeing the line
    metadata. Currently, it is not.

    This patch makes sure that the line state is consistent and NULL
    bitmap pointers are not dereferenced.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Add GLP-2.0 SPDX license tag to all pblk files

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • pblk guarantees write ordering at a chunk level through a per open chunk
    semaphore. At this point, since we only have an open I/O stream for both
    user and GC data, the semaphore is per parallel unit.

    For the metadata I/O that is synchronous, the semaphore is not needed as
    ordering is guaranteed. However, if the metadata scheme changes or
    multiple streams are open, this guarantee might not be preserved.

    This patch makes sure that all writes go through the semaphore, even for
    synchronous I/O. This is consistent with pblk's write I/O model. It also
    simplifies maintenance since changes in the metadata scheme could cause
    ordering issues.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • pblk maintains two different metadata paths for smeta and emeta, which
    store metadata at the start of the line and at the end of the line,
    respectively. Until now, these path has been common for writing and
    retrieving metadata, however, as these paths diverge, the common code
    becomes less clear and unnecessary complicated.

    In preparation for further changes to the metadata write path, this
    patch separates the write and read paths for smeta and emeta and
    removes the synchronous emeta path as it not used anymore (emeta is
    scheduled asynchronously to prevent jittering due to internal I/Os).

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • dma allocations for ppa_list and meta_list in rqd are replicated in
    several places across the pblk codebase. Make helpers to encapsulate
    creation and deletion to simplify the code.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The lightnvm subsystem provides helpers to retrieve chunk metadata,
    where the target needs to provide a buffer to store the metadata. An
    implicit assumption is that this buffer is contiguous and can be used to
    retrieve the data from the device. If the device exposes too many
    chunks, then kmalloc might fail, thus failing instance creation.

    This patch removes this assumption by implementing an internal buffer in
    the lightnvm subsystem to retrieve chunk metadata. Targets can then
    use virtual memory allocations. Since this is a target API change, adapt
    pblk accordingly.

    Signed-off-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Trace state of chunk resets.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Add trace events for tracking pblk state changes.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Add trace events for logging for line state changes.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Introduce trace points for tracking chunk states in pblk - this is
    useful for inspection of the entire state of the drive, and real handy
    for both fw and pblk debugging.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Remove the debug only iteration within __pblk_down_page, which
    then allows us to reduce the number of arguments down to pblk and
    the parallel unit from the functions that calls it. Simplifying the
    callers logic considerably.

    Also, rename the functions pblk_[down/up]_page to
    pblk_[down/up]_chunk, to communicate that it manages the write
    pointer of the chunk. Note that it also protects the parallel unit
    such that at most one chunk is active per parallel unit.

    Signed-off-by: Matias Bjørling
    Reviewed-by: Javier González
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The parameters nr_ppas and ppa_list are not used, so remove them.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Line map bitmap allocations are fairly large and can fail. Allocation
    failures are fatal to pblk, stopping the write pipeline. To avoid this,
    allocate the bitmaps using a mempool instead.

    Mempool allocations never fail if called from a process context,
    and pblk *should* only allocate map bitmaps in process context,
    but keep the failure handling for robustness sake.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • If a line is recovered from open chunks, the memory structures for
    emeta have not necessarily been properly set on line initialization.
    When closing a line, make sure that emeta is consistent so that the line
    can be recovered on the fast path on next reboot.

    Also, remove a couple of empty lines at the end of the function.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The current helper to obtain a line from a ppa returns the line id,
    which requires its users to explicitly retrieve the pointer to the line
    with the id.

    Make 2 different helpers: one returning the line id and one returning
    the line directly.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The read completion path uses the put_line variable to decide whether
    the reference on a line should be released. The function name used for
    that is pblk_read_put_rqd_kref, which could lead one to believe that it
    is the rqd that is releasing the reference, while it is the line
    reference that is put.

    Rename and also split the function in two to account for either rqd or
    single ppa callers and move it to core, such that it later can be used
    in the write path as well.

    Signed-off-by: Matias Bjørling
    Reviewed-by: Javier González
    Reviewed-by: Heiner Litz
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • pblk implements two data paths for recovery line state. One for 1.2
    and another for 2.0, instead of having pblk implement these, combine
    them in the core to reduce complexity and make available to other
    targets.

    The new interface will adhere to the 2.0 chunk definition,
    including managing open chunks with an active write pointer. To provide
    this interface, a 1.2 device recovers the state of the chunks by
    manually detecting if a chunk is either free/open/close/offline, and if
    open, scanning the flash pages sequentially to find the next writeable
    page. This process takes on average ~10 seconds on a device with 64 dies,
    1024 blocks and 60us read access time. The process can be parallelized
    but is left out for maintenance simplicity, as the 1.2 specification is
    deprecated. For 2.0 devices, the logic is maintained internally in the
    drive and retrieved through the 2.0 interface.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • rqd.error is masked by the return value of pblk_submit_io_sync.
    The rqd structure is then passed on to the end_io function, which
    assumes that any error should lead to a chunk being marked
    offline/bad. Since the pblk_submit_io_sync can fail before the
    command is issued to the device, the error value maybe not correspond
    to a media failure, leading to chunks being immaturely retired.

    Also, the pblk_blk_erase_sync function prints an error message in case
    the erase fails. Since the caller prints an error message by itself,
    remove the error message in this function.

    Signed-off-by: Matias Bjørling
    Reviewed-by: Javier González
    Reviewed-by: Hans Holmberg
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Add nvm_set_flags helper to enable core to appropriately
    set the command flags for read/write/erase depending on which version
    a drive supports.

    The flags arguments can be distilled into the access hint,
    scrambling, and program/erase suspend. Replace the access hint with
    a "is_seq" parameter. The rest of the flags are dependent on the
    command opcode, which is trivial to detect and set.

    Signed-off-by: Matias Bjørling
    Reviewed-by: Javier González
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

13 Jul, 2018

3 commits


01 Jun, 2018

9 commits

  • pblk allocates line bitmaps within the line lock unnecessarily. In order
    to take pressure out of the fast patch, allocate line bitmaps outside
    of this lock and refactor accordingly.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Unless we kick the writer directly when setting a new flush point, the
    user risks having to wait for up to one second (the default timeout for
    the write thread to be kicked) for the IO to complete.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Currently in case of error caused by bio_pc_add_page in
    pblk_bio_add_pages two issues occur when calling from
    pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
    are trying to free pages not allocated from our mempool. Second one
    is the warn from dma_pool_free, that we are trying to free NULL
    pointer dma.

    This commit fix both issues.

    Signed-off-by: Igor Konopko
    Signed-off-by: Marcin Dziegielewski
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Smeta write errors were previously ignored. Skip these
    lines instead and throw them back on the free
    list, so the chunks will go through a reset cycle
    before we attempt to use the line again.

    Signed-off-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Write failures should not happen under normal circumstances,
    so in order to bring the chunk back into a known state as soon
    as possible, evacuate all the valid data out of the line and let the
    fw judge if the block can be written to in the next reset cycle.

    Do this by introducing a new gc list for lines with failed writes,
    and ensure that the rate limiter allocates a small portion of
    the write bandwidth to get the job done.

    The lba list is saved in memory for use during gc as we
    cannot gurantee that the emeta data is readable if a write
    error occurred.

    Signed-off-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Remove dead function for manual sync. I/O

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • If the namespace is unregistered before the LightNVM target is removed
    (e.g., on hot unplug) it is too late for the target to store any metadata
    on the device - any attempt to write to the device will fail. In this
    case, pass on a "gracefull teardown" flag to the target to let it know
    when this happens.

    In the case of pblk, we pad the open line (close all open chunks) to
    improve data retention. In the event of an ungraceful shutdown, avoid
    this part and just clean up.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Remove unnecessary argument on pblk_line_free()

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Return a meaningful error when the sanity vector I/O check fails.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González