29 Jul, 2020

1 commit

  • Add passthru command handling capability for the NVMeOF target and
    export passthru APIs which are used to integrate passthru
    code with nvmet-core.

    The new file passthru.c handles passthru cmd parsing and execution.
    In the passthru mode, we create a block layer request from the nvmet
    request and map the data on to the block layer request.

    Admin commands and features are on an allow list as there are a number
    of each that don't make too much sense with passthrough. We use an
    allow list such that new commands can be considered before being blindly
    passed through. In both cases, vendor specific commands are always
    allowed.

    We also reject reservation IO commands as the underlying device cannot
    differentiate between multiple hosts behind a fabric.

    Based-on-a-patch-by: Chaitanya Kulkarni
    Signed-off-by: Logan Gunthorpe
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    Logan Gunthorpe
     

08 Jul, 2020

3 commits

  • Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
    in NVM Express TP4053. Zoned namespaces are discovered based on their
    Command Set Identifier reported in the namespaces Namespace
    Identification Descriptor list. A successfully discovered Zoned
    Namespace will be registered with the block layer as a host managed
    zoned block device with Zone Append command support. A namespace that
    does not support append is not supported by the driver.

    Reviewed-by: Martin K. Petersen
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Javier González
    Reviewed-by: Himanshu Madhani
    Signed-off-by: Hans Holmberg
    Signed-off-by: Dmitry Fomichev
    Signed-off-by: Ajay Joshi
    Signed-off-by: Aravind Ramesh
    Signed-off-by: Niklas Cassel
    Signed-off-by: Matias Bjørling
    Signed-off-by: Damien Le Moal
    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig

    Keith Busch
     
  • The Commands Supported and Effects log page was extended with a CSI
    field that enables the host to query the log page for each command set
    supported. Retrieve this log page for each command set that an attached
    namespace supports, and save a pointer to that log in the namespace head.

    Reviewed-by: Matias Bjørling
    Reviewed-by: Javier González
    Reviewed-by: Himanshu Madhani
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Daniel Wagner
    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig

    Keith Busch
     
  • Implements support for multiple I/O Command Sets. NVMe TP 4056
    introduces a method to enumerate multiple command sets per namespace. If
    the command set is exposed, this method for enumeration will be used
    instead of the traditional method that uses the CC.CSS register command
    set register for command set identification.

    For namespaces where the Command Set Identifier is not supported or
    recognized, the specific namespace will not be created.

    Reviewed-by: Javier González
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Matias Bjørling
    Reviewed-by: Daniel Wagner
    Reviewed-by: Himanshu Madhani
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Niklas Cassel
    Signed-off-by: Christoph Hellwig

    Niklas Cassel
     

27 May, 2020

2 commits

  • The enumerations will be used to expose the namespace metadata format by
    the target.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Israel Rukshin
    Signed-off-by: Max Gurtovoy
    Reviewed-by: James Smart
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Christoph Hellwig

    Israel Rukshin
     
  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    sizeof(flexible-array-member) triggers a warning because flexible array
    members have incomplete type[1]. There are some instances of code in
    which the sizeof operator is being incorrectly/erroneously applied to
    zero-length arrays and the result is zero. Such instances may be hiding
    some bugs. So, this work (flexible-array member conversions) will also
    help to get completely rid of those sorts of issues.

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Christoph Hellwig

    Gustavo A. R. Silva
     

10 May, 2020

2 commits

  • Improve code readability by defining the specification's constants that
    the driver is using when decoding identification payloads.

    Signed-off-by: Keith Busch
    Reviewed-by: Bart van Assche
    Reviewed-by: Chaitanya Kulkarni
    Acked-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • With reference to the NVMeOF Specification (page 44, Figure 38)
    discovery log page entry provides address family field. We do set the
    transport type field but the adrfam field is not set when using loop
    transport and also it doesn't have support in the nvme-cli. So when
    reading discovery log page with a loop transport it leads to confusing
    output.

    As per the spec for adrfam value 254 is reserved for Intra Host
    Transport i.e. loopback), we add a required macro in the protocol
    header file, set default port disc addr entry's adrfam to
    NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for
    show/store attribute.

    Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get
    following output for nvme discover command from nvme-cli which is
    confusing.
    trtype: loop
    adrfam: ipv4
    trtype: loop
    adrfam: ipv6
    trtype: loop
    adrfam: infiniband
    trtype: loop
    adrfam: fibre-channel
    trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
    adrfam: pci #
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

22 Nov, 2019

1 commit

  • According to the NVMe specification, the over temperature threshold and
    under temperature threshold features shall be implemented for Composite
    Temperature if a non-zero WCTEMP field value is reported in the Identify
    Controller data structure. The features are also implemented for all
    implemented temperature sensors (i.e., all Temperature Sensor fields that
    report a non-zero value).

    This provides the over temperature threshold and under temperature
    threshold for each sensor as temperature min and max values of hwmon
    sysfs attributes.

    The WCTEMP is already provided as a temperature max value for Composite
    Temperature, but this change isn't incompatible. Because the default
    value of the over temperature threshold for Composite Temperature is
    the WCTEMP.

    Now the alarm attribute for Composite Temperature indicates one of the
    temperature is outside of a temperature threshold. Because there is only
    a single bit in Critical Warning field that indicates a temperature is
    outside of a threshold.

    Example output from the "sensors" command:

    nvme-pci-0100
    Adapter: PCI adapter
    Composite: +33.9°C (low = -273.1°C, high = +69.8°C)
    (crit = +79.8°C)
    Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C)
    Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C)
    Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C)

    This also adds helper macros for kelvin from/to milli Celsius conversion,
    and replaces the repeated code in hwmon.c.

    Cc: Keith Busch
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Jean Delvare
    Reviewed-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Signed-off-by: Akinobu Mita
    Signed-off-by: Keith Busch

    Akinobu Mita
     

05 Nov, 2019

2 commits

  • Update enumerations and structures in include/linux/nvme.h
    to resync with the nvmecli.

    All the updates are mentioned in the ratified NVMe 1.4 spec
    https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Revanth Rajashekar
    Signed-off-by: Keith Busch
    Signed-off-by: Jens Axboe

    Revanth Rajashekar
     
  • Fix the status code of canceled requests initiated by the host according
    to TP4028 (Status Code 0x371):
    "Command Aborted By host: The command was aborted as a result of host
    action (e.g., the host disconnected the Fabric connection)."

    Also in a multipath environment, unless otherwise specified, errors of
    this type (path related) should be retried using a different path, if
    one is available.

    Signed-off-by: Max Gurtovoy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Keith Busch
    Signed-off-by: Jens Axboe

    Max Gurtovoy
     

30 Aug, 2019

3 commits


10 Jul, 2019

1 commit

  • Several new fields have been introduced in version 1.4 of the NVMe spec
    at offsets that were defined as reserved in version 1.3d of the NVMe
    spec. Update the definition of the nvme_id_ns data structure such that
    it is in sync with version 1.4 of the NVMe spec. This change preserves
    backwards compatibility.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Keith Busch
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig

    Bart Van Assche
     

21 Jun, 2019

3 commits


14 May, 2019

1 commit


11 Apr, 2019

1 commit

  • The nvme target hadn't been taking the Get Log Page offset parameter
    into consideration, and so has been returning corrupted log pages when
    offsets are used. Since many tools, including nvme-cli, split the log
    request to 4k, we've been breaking discovery log responses when more
    than 3 subsystems exist.

    Fix the returned data by internally generating the entire discovery
    log page and copying only the requested bytes into the user buffer. The
    command log page offset type has been modified to a native __le64 to
    make it easier to extract the value from a command.

    Signed-off-by: Keith Busch
    Tested-by: Minwoo Im
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Reviewed-by: James Smart
    Signed-off-by: Christoph Hellwig

    Keith Busch
     

20 Feb, 2019

1 commit


13 Dec, 2018

3 commits


08 Dec, 2018

8 commits

  • A controller may have an internal state that is not able to successfully
    process commands for a short duration. In such states, an immediate
    command requeue is expected to fail. The driver may exceed its max
    retry count, which permanently ends the command in failure when the same
    command would succeed after waiting for the controller to be ready.

    NVMe ratified TP 4033 provides a delay hint in the completion status
    code for failed commands. Implement the retry delay based on the command
    completion status and the controller's requested delay.

    Note that requeued commands are handled per request_queue, not per
    individual request. If multiple commands fail, the controller should
    consistently report the desired delay time for retryable commands in
    all CQEs, otherwise the requeue list may be kicked too soon.

    Signed-off-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • Technical Proposal introduces an indication for SQ flow control
    disable support. Expose it since we are able to operate in this mode.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Only override the allowed parts of it.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    [hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Technical proposal 8005 "fabrics SQ flow control" introduces a mode
    where a host and controller agree to omit sq_head pointer updates
    when sending nvme completions.

    In case the host indicated desire to operate in this mode (connect attribute)
    the controller will return back a connect completion with sq_head value
    of 0xffff as indication that it will omit sq_head pointer updates.

    This mode saves us an atomic update in the I/O path.

    Reviewed-by: Hannes Reinecke
    [hch: suggested better implementation]
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • Add AEN/AER values as defined by the specification

    Signed-off-by: Jay Sternberg
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jay Sternberg
     
  • Functions nvmet_aen_disabled and nvmet_clear_aen were using
    values not bit numbers ie 1 << 9 not 9 for bit function clear_bit
    and test_and_set_bit.

    Signed-off-by: Jay Sternberg
    Reviewed-by: Phil Cayton
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jay Sternberg
     
  • If the controller supports traffic based keep alive, we restart the keep
    alive timer if any admin or io commands was completed during the kato
    period. This prevents a possible starvation of keep alive commands in
    the presence of heavy traffic as in such case, we already have a health
    indication from the host perspective.

    Only set a comp_seen indicator in case the controller supports keep
    alive to minimize the overhead for pci controllers.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     
  • We are growing more controller attributes, so use a proper enumeration
    for it. For now just add the 128-bit hostid which we support.

    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

02 Oct, 2018

1 commit

  • When an io is rejected by nvmf_check_ready() due to validation of the
    controller state, the nvmf_fail_nonready_command() will normally return
    BLK_STS_RESOURCE to requeue and retry. However, if the controller is
    dying or the I/O is marked for NVMe multipath, the I/O is failed so that
    the controller can terminate or so that the io can be issued on a
    different path. Unfortunately, as this reject point is before the
    transport has accepted the command, blk-mq ends up completing the I/O
    and never calls nvme_complete_rq(), which is where multipath may preserve
    or re-route the I/O. The end result is, the device user ends up seeing an
    EIO error.

    Example: single path connectivity, controller is under load, and a reset
    is induced. An I/O is received:

    a) while the reset state has been set but the queues have yet to be
    stopped; or
    b) after queues are started (at end of reset) but before the reconnect
    has completed.

    The I/O finishes with an EIO status.

    This patch makes the following changes:

    - Adds the HOST_PATH_ERROR pathing status from TP4028
    - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
    - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.

    Signed-off-by: James Smart
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    James Smart
     

08 Aug, 2018

2 commits


28 Jul, 2018

2 commits

  • Add various defintions from NVMe 1.3 TP 4004.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     
  • NVMe 1.3 added a new log specific field to the get log page CQ
    defintion, add it to our get_log_page SQ structure.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn

    Christoph Hellwig
     

23 Jul, 2018

1 commit


01 Jun, 2018

2 commits