03 Jun, 2018

2 commits

  • If we end up splitting a bio and the queue goes away between
    the initial submission and the later split submission, then we
    can block forever in blk_queue_enter() waiting for the reference
    to drop to zero. This will never happen, since we already hold
    a reference.

    Mark a split bio as already having entered the queue, so we can
    just use the live non-blocking queue enter variant.

    Thanks to Tetsuo Handa for the analysis.

    Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The counter for the number of allocated pages includes pages in the
    mempool's reserve, so checking that the number of allocated pages is 0
    needs to happen after we exit the mempool.

    Fixes: 6f1c819c219f ("dm: convert to bioset_init()/mempool_init()")
    Signed-off-by: Kent Overstreet
    Reported-by: Krzysztof Kozlowski
    Acked-by: Mike Snitzer

    Fixed to always just use percpu_counter_sum()

    Signed-off-by: Jens Axboe

    Kent Overstreet
     

01 Jun, 2018

38 commits

  • pblk allocates line bitmaps within the line lock unnecessarily. In order
    to take pressure out of the fast patch, allocate line bitmaps outside
    of this lock and refactor accordingly.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Unless we kick the writer directly when setting a new flush point, the
    user risks having to wait for up to one second (the default timeout for
    the write thread to be kicked) for the IO to complete.

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • When switching between different lun configurations, there is no
    guarantee that all lines that contain closed/open chunks have some
    valid data to recover.

    Check that the smeta chunk has been written to instead. Also
    skip bad lines (that does not have enough good chunks).

    Signed-off-by: Hans Holmberg
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • In the read path, pblk gets a reference to the incoming bio and puts it
    after ending the bio. Though this behavior is correct, it is unnecessary
    since pblk is the one putting the bio, therefore, it cannot disappear
    underneath it.

    Removing this reference, allows to clean up rqd->bio and avoids pointer
    bouncing for the different read paths. Now, the incoming bio always
    resides in the read context and pblk's internal bios (if any) reside in
    rqd->bio.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • In some cases, users can want set write buffer size manually, e.g. to
    adjust it to specific workload. This patch provides the possibility
    to set write buffer size via module parameter feature.

    Signed-off-by: Marcin Dziegielewski
    Signed-off-by: Igor Konopko
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Marcin Dziegielewski
     
  • When error occurs during bio_add_page on partial read path, pblk
    tries to free pages twice.

    Signed-off-by: Igor Konopko
    Signed-off-by: Marcin Dziegielewski
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Currently in case of error caused by bio_pc_add_page in
    pblk_bio_add_pages two issues occur when calling from
    pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
    are trying to free pages not allocated from our mempool. Second one
    is the warn from dma_pool_free, that we are trying to free NULL
    pointer dma.

    This commit fix both issues.

    Signed-off-by: Igor Konopko
    Signed-off-by: Marcin Dziegielewski
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Igor Konopko
     
  • Smeta write errors were previously ignored. Skip these
    lines instead and throw them back on the free
    list, so the chunks will go through a reset cycle
    before we attempt to use the line again.

    Signed-off-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Write failures should not happen under normal circumstances,
    so in order to bring the chunk back into a known state as soon
    as possible, evacuate all the valid data out of the line and let the
    fw judge if the block can be written to in the next reset cycle.

    Do this by introducing a new gc list for lines with failed writes,
    and ensure that the rate limiter allocates a small portion of
    the write bandwidth to get the job done.

    The lba list is saved in memory for use during gc as we
    cannot gurantee that the emeta data is readable if a write
    error occurred.

    Signed-off-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • The write error recovery path is incomplete, so rework
    the write error recovery handling to do resubmits directly
    from the write buffer.

    When a write error occurs, the remaining sectors in the chunk are
    mapped out and invalidated and the request inserted in a resubmit list.

    The writer thread checks if there are any requests to resubmit,
    scans and invalidates any lbas that have been overwritten by later
    writes and resubmits the failed entries.

    Signed-off-by: Hans Holmberg
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Hans Holmberg
     
  • Remove dead function for manual sync. I/O

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • If the namespace is unregistered before the LightNVM target is removed
    (e.g., on hot unplug) it is too late for the target to store any metadata
    on the device - any attempt to write to the device will fail. In this
    case, pass on a "gracefull teardown" flag to the target to let it know
    when this happens.

    In the case of pblk, we pad the open line (close all open chunks) to
    improve data retention. In the event of an ungraceful shutdown, avoid
    this part and just clean up.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Do the check for the chunk state after making sure that the chunk type
    is supported.

    Fixes: 32ef9412c114 ("lightnvm: pblk: implement get log report chunk")
    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Remove unnecessary argument on pblk_line_free()

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Call nvm_submit_io directly and remove an unnecessary indirection on the
    read path.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Return a meaningful error when the sanity vector I/O check fails.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • When cleaning up buffer entries as we wrap up, their state should be
    "completed". If any of the entries is in "submitted" state, it means
    that something bad has happened. Trigger a warning immediately instead of
    waiting for the state flag to eventually be updated, thus hiding the
    issue.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • In the event of a mismatch between the read LBA and the metadata pointer
    reported by the device, improve the error message to be able to detect
    the offending physical address (PPA) mapped to the corrupted LBA.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Check that the lba stored in the LBA metadata is correct in the GC path
    too. This requires a new helper function to check random reads in the
    vector read.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Bad blocks can grow at runtime. Check that the number of valid blocks in
    a line are within the sanity threshold before allocating the line for
    new writes.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • In the event of a line failing to allocate, fail gracefully and stop the
    pipeline to avoid more write failing in the same place.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Pull NVMe changes from Christoph:

    "Below is another set of NVMe updates for 4.18. Besides the usual bug
    fixes this includes more feature completness in terms of AEN and log
    page handling on the target."

    * 'nvme-4.18' of git://git.infradead.org/nvme:
    nvme: use the changed namespaces list log to clear ns data changed AENs
    nvme: mark nvme_queue_scan static
    nvme: submit AEN event configuration on startup
    nvmet: mask pending AENs
    nvmet: add AEN configuration support
    nvmet: implement the changed namespaces log
    nvmet: split log page implementation
    nvmet: add a new nvmet_zero_sgl helper
    nvme.h: add AEN configuration symbols
    nvme.h: add the changed namespace list log
    nvme.h: untangle AEN notice definitions
    nvmet: fix error return code in nvmet_file_ns_enable()
    nvmet: fix a typo in nvmet_file_ns_enable()
    nvme-fabrics: allow internal passthrough command on deleting controllers
    nvme-loop: add support for multiple ports
    nvme-pci: simplify __nvme_submit_cmd
    nvme-pci: Rate limit the nvme timeout warnings
    nvme: allow duplicate controller if prior controller being deleted

    Jens Axboe
     
  • There is almost no shared logic, which leads to a very confusing code
    flow.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Damien Le Moal
    Tested-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Both callers take just around so function call, so move it in.
    Also remove the now pointless blk_mq_sched_init wrapper.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Damien Le Moal
    Tested-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Reported-by: Damien Le Moal
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Damien Le Moal
    Tested-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • These are only used by the block core. Also move the declarations to
    block/blk.h.

    Reported-by: Damien Le Moal
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Damien Le Moal
    Tested-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • No point in doing this in elevator_init.

    Signed-off-by: Christoph Hellwig
    Reported-by: Damien Le Moal
    Reviewed-by: Damien Le Moal
    Tested-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Per section 5.2 we need to issue the corresponding log page to clear an
    AEN, so for a namespace data changed AEN we need to read the changed
    namespace list log. And once we read that log anyway we might as well
    use it to optimize the rescan.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg

    Christoph Hellwig
     
  • And move it toward the top of the file to avoid a forward declaration.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     
  • We should register for AEN events; some law-abiding targets might
    not be sending us AENs otherwise.

    Signed-off-by: Hannes Reinecke
    [hch: slight cleanups]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Sagi Grimberg

    Hannes Reinecke
     
  • Per section 5.2 of the NVMe 1.3 spec:

    "When the controller posts a completion queue entry for an outstanding
    Asynchronous Event Request command and thus reports an asynchronous
    event, subsequent events of that event type are automatically masked by
    the controller until the host clears that event. An event is cleared by
    reading the log page associated with that event using the Get Log Page
    command (see section 5.14)."

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg

    Christoph Hellwig
     
  • AEN configuration via the 'Get Features' and 'Set Features' admin
    command is mandatory, so we should be implemeting handling for it.

    Signed-off-by: Hannes Reinecke
    [hch: use WRITE_ONCE, check for invalid values]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Daniel Verkamp

    Christoph Hellwig
     
  • Just keep a per-controller buffer of changed namespaces and copy it out
    in the get log page implementation.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Daniel Verkamp

    Christoph Hellwig
     
  • Remove the common code to allocate a buffer and copy it into the SGL.
    Instead the two no-op implementations just zero the SGL directly, and
    the smart log allocates a buffer on its own. This prepares for the
    more elaborate ANA log page.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     
  • Zeroes the SGL in the payload.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     
  • Signed-off-by: Hannes Reinecke
    [hch: split from a larger patch]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Johannes Thumshirn

    Hannes Reinecke
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     
  • Stop including the event type in the definitions for the notice type.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig