30 Aug, 2019

3 commits


10 Jul, 2019

1 commit

  • Several new fields have been introduced in version 1.4 of the NVMe spec
    at offsets that were defined as reserved in version 1.3d of the NVMe
    spec. Update the definition of the nvme_id_ns data structure such that
    it is in sync with version 1.4 of the NVMe spec. This change preserves
    backwards compatibility.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Keith Busch
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig

    Bart Van Assche
     

21 Jun, 2019

3 commits


14 May, 2019

1 commit


11 Apr, 2019

1 commit

  • The nvme target hadn't been taking the Get Log Page offset parameter
    into consideration, and so has been returning corrupted log pages when
    offsets are used. Since many tools, including nvme-cli, split the log
    request to 4k, we've been breaking discovery log responses when more
    than 3 subsystems exist.

    Fix the returned data by internally generating the entire discovery
    log page and copying only the requested bytes into the user buffer. The
    command log page offset type has been modified to a native __le64 to
    make it easier to extract the value from a command.

    Signed-off-by: Keith Busch
    Tested-by: Minwoo Im
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Reviewed-by: James Smart
    Signed-off-by: Christoph Hellwig

    Keith Busch
     

20 Feb, 2019

1 commit


13 Dec, 2018

3 commits


08 Dec, 2018

8 commits

  • A controller may have an internal state that is not able to successfully
    process commands for a short duration. In such states, an immediate
    command requeue is expected to fail. The driver may exceed its max
    retry count, which permanently ends the command in failure when the same
    command would succeed after waiting for the controller to be ready.

    NVMe ratified TP 4033 provides a delay hint in the completion status
    code for failed commands. Implement the retry delay based on the command
    completion status and the controller's requested delay.

    Note that requeued commands are handled per request_queue, not per
    individual request. If multiple commands fail, the controller should
    consistently report the desired delay time for retryable commands in
    all CQEs, otherwise the requeue list may be kicked too soon.

    Signed-off-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • Technical Proposal introduces an indication for SQ flow control
    disable support. Expose it since we are able to operate in this mode.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Only override the allowed parts of it.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    [hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Technical proposal 8005 "fabrics SQ flow control" introduces a mode
    where a host and controller agree to omit sq_head pointer updates
    when sending nvme completions.

    In case the host indicated desire to operate in this mode (connect attribute)
    the controller will return back a connect completion with sq_head value
    of 0xffff as indication that it will omit sq_head pointer updates.

    This mode saves us an atomic update in the I/O path.

    Reviewed-by: Hannes Reinecke
    [hch: suggested better implementation]
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Add AEN/AER values as defined by the specification

    Signed-off-by: Jay Sternberg
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jay Sternberg
     
  • Functions nvmet_aen_disabled and nvmet_clear_aen were using
    values not bit numbers ie 1 << 9 not 9 for bit function clear_bit
    and test_and_set_bit.

    Signed-off-by: Jay Sternberg
    Reviewed-by: Phil Cayton
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jay Sternberg
     
  • If the controller supports traffic based keep alive, we restart the keep
    alive timer if any admin or io commands was completed during the kato
    period. This prevents a possible starvation of keep alive commands in
    the presence of heavy traffic as in such case, we already have a health
    indication from the host perspective.

    Only set a comp_seen indicator in case the controller supports keep
    alive to minimize the overhead for pci controllers.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • We are growing more controller attributes, so use a proper enumeration
    for it. For now just add the 128-bit hostid which we support.

    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

02 Oct, 2018

1 commit

  • When an io is rejected by nvmf_check_ready() due to validation of the
    controller state, the nvmf_fail_nonready_command() will normally return
    BLK_STS_RESOURCE to requeue and retry. However, if the controller is
    dying or the I/O is marked for NVMe multipath, the I/O is failed so that
    the controller can terminate or so that the io can be issued on a
    different path. Unfortunately, as this reject point is before the
    transport has accepted the command, blk-mq ends up completing the I/O
    and never calls nvme_complete_rq(), which is where multipath may preserve
    or re-route the I/O. The end result is, the device user ends up seeing an
    EIO error.

    Example: single path connectivity, controller is under load, and a reset
    is induced. An I/O is received:

    a) while the reset state has been set but the queues have yet to be
    stopped; or
    b) after queues are started (at end of reset) but before the reconnect
    has completed.

    The I/O finishes with an EIO status.

    This patch makes the following changes:

    - Adds the HOST_PATH_ERROR pathing status from TP4028
    - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
    - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.

    Signed-off-by: James Smart
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    James Smart
     

08 Aug, 2018

2 commits


28 Jul, 2018

2 commits

  • Add various defintions from NVMe 1.3 TP 4004.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     
  • NVMe 1.3 added a new log specific field to the get log page CQ
    defintion, add it to our get_log_page SQ structure.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     

23 Jul, 2018

1 commit


01 Jun, 2018

3 commits


18 Jan, 2018

1 commit

  • Define the bit positions instead of macros using the magic values,
    and move the expanded helpers to calculate the size and size unit into
    the implementation C file.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Logan Gunthorpe

    Christoph Hellwig
     

11 Nov, 2017

3 commits

  • This will give udev a chance to observe and handle asynchronous event
    notifications and clear the log to unmask future events of the same type.
    The driver will create a change uevent of the asyncronuos event result
    before submitting the next AEN request to the device if a completed AEN
    event is of type error, smart, command set or vendor specific,

    Signed-off-by: Keith Busch
    Reviewed-by: Guan Junxiong
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • All the transports were unnecessarilly duplicating the AEN request
    accounting. This patch defines everything in one place.

    Signed-off-by: Keith Busch
    Reviewed-by: Guan Junxiong
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • The NVMe standard provides a command effects log page so the host may
    be aware of special requirements it may need to do for a particular
    command. For example, the command may need to run with IO quiesced to
    prevent timeouts or undefined behavior, or it may change the logical block
    formats that determine how the host needs to construct future commands.

    This patch saves the nvme command effects log page if the controller
    supports it, and performs appropriate actions before and after an admin
    passthrough command is completed. If the controller does not support the
    command effects log page, the driver will define the effects for known
    opcodes. The nvme format and santize are the only commands in this patch
    with known effects.

    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     

25 Sep, 2017

2 commits


12 Sep, 2017

1 commit

  • Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size
    and Host Memory Maximum Descriptors Entries field that were added in
    TP 4002 HMB Enhancements. These allow the controller to advertise
    limits for the usual number of segments in the host memory buffer, as
    well as a minimum usable per-segment size.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch

    Christoph Hellwig
     

10 Sep, 2017

1 commit

  • Pull followup block layer updates from Jens Axboe:
    "I ended up splitting the main pull request for this series into two,
    mainly because of clashes between NVMe fixes that went into 4.13 after
    the for-4.14 branches were split off. This pull request is mostly
    NVMe, but not exclusively. In detail, it contains:

    - Two pull request for NVMe changes from Christoph. Nothing new on
    the feature front, basically just fixes all over the map for the
    core bits, transport, rdma, etc.

    - Series from Bart, cleaning up various bits in the BFQ scheduler.

    - Series of bcache fixes, which has been lingering for a release or
    two. Coly sent this in, but patches from various people in this
    area.

    - Set of patches for BFQ from Paolo himself, updating both
    documentation and fixing some corner cases in performance.

    - Series from Omar, attempting to now get the 4k loop support
    correct. Our confidence level is higher this time.

    - Series from Shaohua for loop as well, improving O_DIRECT
    performance and fixing a use-after-free"

    * 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block: (74 commits)
    bcache: initialize dirty stripes in flash_dev_run()
    loop: set physical block size to logical block size
    bcache: fix bch_hprint crash and improve output
    bcache: Update continue_at() documentation
    bcache: silence static checker warning
    bcache: fix for gc and write-back race
    bcache: increase the number of open buckets
    bcache: Correct return value for sysfs attach errors
    bcache: correct cache_dirty_target in __update_writeback_rate()
    bcache: gc does not work when triggering by manual command
    bcache: Don't reinvent the wheel but use existing llist API
    bcache: do not subtract sectors_to_gc for bypassed IO
    bcache: fix sequential large write IO bypass
    bcache: Fix leak of bdev reference
    block/loop: remove unused field
    block/loop: fix use after free
    bfq: Use icq_to_bic() consistently
    bfq: Suppress compiler warnings about comparisons
    bfq: Check kstrtoul() return value
    bfq: Declare local functions static
    ...

    Linus Torvalds
     

30 Aug, 2017

1 commit

  • NVMe 1.3 specification defines the Optional Admin Command Support feature
    flags, bit 8 set to '1' then the controller supports the Doorbell Buffer
    Config command. Bit 7 is used for Virtualization Mangement command.

    Signed-off-by: Changpeng Liu
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Max Gurtovoy
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Christoph Hellwig
    Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
    Cc: stable@vger.kernel.org

    Changpeng Liu
     

29 Aug, 2017

1 commit