29 Jul, 2020
1 commit
-
Add passthru command handling capability for the NVMeOF target and
export passthru APIs which are used to integrate passthru
code with nvmet-core.The new file passthru.c handles passthru cmd parsing and execution.
In the passthru mode, we create a block layer request from the nvmet
request and map the data on to the block layer request.Admin commands and features are on an allow list as there are a number
of each that don't make too much sense with passthrough. We use an
allow list such that new commands can be considered before being blindly
passed through. In both cases, vendor specific commands are always
allowed.We also reject reservation IO commands as the underlying device cannot
differentiate between multiple hosts behind a fabric.Based-on-a-patch-by: Chaitanya Kulkarni
Signed-off-by: Logan Gunthorpe
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
08 Jul, 2020
3 commits
-
Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
in NVM Express TP4053. Zoned namespaces are discovered based on their
Command Set Identifier reported in the namespaces Namespace
Identification Descriptor list. A successfully discovered Zoned
Namespace will be registered with the block layer as a host managed
zoned block device with Zone Append command support. A namespace that
does not support append is not supported by the driver.Reviewed-by: Martin K. Petersen
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Reviewed-by: Sagi Grimberg
Reviewed-by: Javier González
Reviewed-by: Himanshu Madhani
Signed-off-by: Hans Holmberg
Signed-off-by: Dmitry Fomichev
Signed-off-by: Ajay Joshi
Signed-off-by: Aravind Ramesh
Signed-off-by: Niklas Cassel
Signed-off-by: Matias Bjørling
Signed-off-by: Damien Le Moal
Signed-off-by: Keith Busch
Signed-off-by: Christoph Hellwig -
The Commands Supported and Effects log page was extended with a CSI
field that enables the host to query the log page for each command set
supported. Retrieve this log page for each command set that an attached
namespace supports, and save a pointer to that log in the namespace head.Reviewed-by: Matias Bjørling
Reviewed-by: Javier González
Reviewed-by: Himanshu Madhani
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Daniel Wagner
Signed-off-by: Keith Busch
Signed-off-by: Christoph Hellwig -
Implements support for multiple I/O Command Sets. NVMe TP 4056
introduces a method to enumerate multiple command sets per namespace. If
the command set is exposed, this method for enumeration will be used
instead of the traditional method that uses the CC.CSS register command
set register for command set identification.For namespaces where the Command Set Identifier is not supported or
recognized, the specific namespace will not be created.Reviewed-by: Javier González
Reviewed-by: Martin K. Petersen
Reviewed-by: Johannes Thumshirn
Reviewed-by: Matias Bjørling
Reviewed-by: Daniel Wagner
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Niklas Cassel
Signed-off-by: Christoph Hellwig
27 May, 2020
2 commits
-
The enumerations will be used to expose the namespace metadata format by
the target.Suggested-by: Christoph Hellwig
Signed-off-by: Israel Rukshin
Signed-off-by: Max Gurtovoy
Reviewed-by: James Smart
Reviewed-by: Martin K. Petersen
Signed-off-by: Christoph Hellwig -
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:struct foo {
int stuff;
struct boo array[];
};By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.Also, notice that, dynamic memory allocations won't be affected by
this change:"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Christoph Hellwig
10 May, 2020
2 commits
-
Improve code readability by defining the specification's constants that
the driver is using when decoding identification payloads.Signed-off-by: Keith Busch
Reviewed-by: Bart van Assche
Reviewed-by: Chaitanya Kulkarni
Acked-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
With reference to the NVMeOF Specification (page 44, Figure 38)
discovery log page entry provides address family field. We do set the
transport type field but the adrfam field is not set when using loop
transport and also it doesn't have support in the nvme-cli. So when
reading discovery log page with a loop transport it leads to confusing
output.As per the spec for adrfam value 254 is reserved for Intra Host
Transport i.e. loopback), we add a required macro in the protocol
header file, set default port disc addr entry's adrfam to
NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for
show/store attribute.Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get
following output for nvme discover command from nvme-cli which is
confusing.
trtype: loop
adrfam: ipv4
trtype: loop
adrfam: ipv6
trtype: loop
adrfam: infiniband
trtype: loop
adrfam: fibre-channel
trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam: pci #
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
22 Nov, 2019
1 commit
-
According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure. The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible. Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.Now the alarm attribute for Composite Temperature indicates one of the
temperature is outside of a temperature threshold. Because there is only
a single bit in Critical Warning field that indicates a temperature is
outside of a threshold.Example output from the "sensors" command:
nvme-pci-0100
Adapter: PCI adapter
Composite: +33.9°C (low = -273.1°C, high = +69.8°C)
(crit = +79.8°C)
Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C)This also adds helper macros for kelvin from/to milli Celsius conversion,
and replaces the repeated code in hwmon.c.Cc: Keith Busch
Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Sagi Grimberg
Cc: Jean Delvare
Reviewed-by: Guenter Roeck
Tested-by: Guenter Roeck
Signed-off-by: Akinobu Mita
Signed-off-by: Keith Busch
05 Nov, 2019
2 commits
-
Update enumerations and structures in include/linux/nvme.h
to resync with the nvmecli.All the updates are mentioned in the ratified NVMe 1.4 spec
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdfReviewed-by: Christoph Hellwig
Signed-off-by: Revanth Rajashekar
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe -
Fix the status code of canceled requests initiated by the host according
to TP4028 (Status Code 0x371):
"Command Aborted By host: The command was aborted as a result of host
action (e.g., the host disconnected the Fabric connection)."Also in a multipath environment, unless otherwise specified, errors of
this type (path related) should be retried using a different path, if
one is available.Signed-off-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
30 Aug, 2019
3 commits
-
The size of a submission queue element should always be 6 (64 bytes)
by spec.However some controllers such as Apple's are not properly implementing
the standard and require a different size.This provides the ground work for the subsequent quirks for these
controllers.Signed-off-by: Benjamin Herrenschmidt
Reviewed-by: Minwoo Im
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg -
This patch adds Get LBA Status command's opcode to the macro that is
used by the trace feature. Now we can see "get_lba_status" instead of
the opcode value itself.Signed-off-by: Minwoo Im
Signed-off-by: Sagi Grimberg -
NVMe 1.4 added Get LBA Status command with opcode 0x86.
Signed-off-by: Minwoo Im
Signed-off-by: Sagi Grimberg
10 Jul, 2019
1 commit
-
Several new fields have been introduced in version 1.4 of the NVMe spec
at offsets that were defined as reserved in version 1.3d of the NVMe
spec. Update the definition of the nvme_id_ns data structure such that
it is in sync with version 1.4 of the NVMe spec. This change preserves
backwards compatibility.Signed-off-by: Bart Van Assche
Reviewed-by: Keith Busch
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
21 Jun, 2019
3 commits
-
This patch introduces fabrics commands tracing feature from host-side.
This patch does not include any changes for the previous host-side
tracing, but just add fabrics commands parsing in cmd=() format.Signed-off-by: Minwoo Im
[hch: fixed some whitespace damage]
Signed-off-by: Christoph Hellwig -
The following patches are going to provide the target-side trace which
might need these kind of macros. It would be great if it can be shared
between host and target side both.Signed-off-by: Minwoo Im
Signed-off-by: Christoph Hellwig -
This patch introduces a nvme_is_fabrics() inline function to check
whether or not the given command structure is for fabrics.Signed-off-by: Minwoo Im
Reviewed-by: Sagi Grimberg
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
14 May, 2019
1 commit
-
Fix typos in enumeration names for nvme status:
s/ACIVATE/ACTIVATE/
s/INSUFFICENT/INSUFFICIENT/Signed-off-by: Minwoo Im
Reviewed-by: Sagi Grimberg
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
11 Apr, 2019
1 commit
-
The nvme target hadn't been taking the Get Log Page offset parameter
into consideration, and so has been returning corrupted log pages when
offsets are used. Since many tools, including nvme-cli, split the log
request to 4k, we've been breaking discovery log responses when more
than 3 subsystems exist.Fix the returned data by internally generating the entire discovery
log page and copying only the requested bytes into the user buffer. The
command log page offset type has been modified to a native __le64 to
make it easier to extract the value from a command.Signed-off-by: Keith Busch
Tested-by: Minwoo Im
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Hannes Reinecke
Reviewed-by: James Smart
Signed-off-by: Christoph Hellwig
20 Feb, 2019
1 commit
-
We already have a ЅPDX header, so no need to duplicate the information.
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
13 Dec, 2018
3 commits
-
This patch adds the NVMe error slot definition from the spec.
Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig -
This is a preparation patch which removes the nvme common command cdw10
array and replace with individual fields. This is needed for the nvmet
error log page implementation make is error log page entry offset
assignment easier.Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig -
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
08 Dec, 2018
8 commits
-
A controller may have an internal state that is not able to successfully
process commands for a short duration. In such states, an immediate
command requeue is expected to fail. The driver may exceed its max
retry count, which permanently ends the command in failure when the same
command would succeed after waiting for the controller to be ready.NVMe ratified TP 4033 provides a delay hint in the completion status
code for failed commands. Implement the retry delay based on the command
completion status and the controller's requested delay.Note that requeued commands are handled per request_queue, not per
individual request. If multiple commands fail, the controller should
consistently report the desired delay time for retryable commands in
all CQEs, otherwise the requeue list may be kicked too soon.Signed-off-by: Keith Busch
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Technical Proposal introduces an indication for SQ flow control
disable support. Expose it since we are able to operate in this mode.Reviewed-by: Hannes Reinecke
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Only override the allowed parts of it.
Reviewed-by: Hannes Reinecke
Signed-off-by: Sagi Grimberg
[hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Technical proposal 8005 "fabrics SQ flow control" introduces a mode
where a host and controller agree to omit sq_head pointer updates
when sending nvme completions.In case the host indicated desire to operate in this mode (connect attribute)
the controller will return back a connect completion with sq_head value
of 0xffff as indication that it will omit sq_head pointer updates.This mode saves us an atomic update in the I/O path.
Reviewed-by: Hannes Reinecke
[hch: suggested better implementation]
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Add AEN/AER values as defined by the specification
Signed-off-by: Jay Sternberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Functions nvmet_aen_disabled and nvmet_clear_aen were using
values not bit numbers ie 1 << 9 not 9 for bit function clear_bit
and test_and_set_bit.Signed-off-by: Jay Sternberg
Reviewed-by: Phil Cayton
Reviewed-by: Christoph Hellwig
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
If the controller supports traffic based keep alive, we restart the keep
alive timer if any admin or io commands was completed during the kato
period. This prevents a possible starvation of keep alive commands in
the presence of heavy traffic as in such case, we already have a health
indication from the host perspective.Only set a comp_seen indicator in case the controller supports keep
alive to minimize the overhead for pci controllers.Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We are growing more controller attributes, so use a proper enumeration
for it. For now just add the 128-bit hostid which we support.Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Hannes Reinecke
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
02 Oct, 2018
1 commit
-
When an io is rejected by nvmf_check_ready() due to validation of the
controller state, the nvmf_fail_nonready_command() will normally return
BLK_STS_RESOURCE to requeue and retry. However, if the controller is
dying or the I/O is marked for NVMe multipath, the I/O is failed so that
the controller can terminate or so that the io can be issued on a
different path. Unfortunately, as this reject point is before the
transport has accepted the command, blk-mq ends up completing the I/O
and never calls nvme_complete_rq(), which is where multipath may preserve
or re-route the I/O. The end result is, the device user ends up seeing an
EIO error.Example: single path connectivity, controller is under load, and a reset
is induced. An I/O is received:a) while the reset state has been set but the queues have yet to be
stopped; or
b) after queues are started (at end of reset) but before the reconnect
has completed.The I/O finishes with an EIO status.
This patch makes the following changes:
- Adds the HOST_PATH_ERROR pathing status from TP4028
- Modifies the reject point such that it appears to queue successfully,
but actually completes the io with the new pathing status and calls
nvme_complete_rq().
- nvme_complete_rq() recognizes the new status, avoids resetting the
controller (likely was already done in order to get this new status),
and calls the multipather to clear the current path that errored.
This allows the next command (retry or new command) to select a new
path if there is one.Signed-off-by: James Smart
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
08 Aug, 2018
2 commits
-
Add various definitions from NVMe 1.3 TP 4005.
Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig -
ANA Phase 3 draft had the 'reserved' field in the group descriptor
format set to '23:17' (so that the first namespace identifier started
at byte 24), but that got move with the approved TP to '31:17'
(so that the first namespace identifier started at byte 32).Signed-off-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
28 Jul, 2018
2 commits
-
Add various defintions from NVMe 1.3 TP 4004.
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn -
NVMe 1.3 added a new log specific field to the get log page CQ
defintion, add it to our get_log_page SQ structure.Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
23 Jul, 2018
1 commit
-
Added some feature ids present in nvme-cli but not kernel.
Signed-off-by: Revanth Rajashekar
Signed-off-by: Christoph Hellwig
01 Jun, 2018
2 commits
-
Signed-off-by: Hannes Reinecke
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Reviewed-by: Johannes Thumshirn -
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Reviewed-by: Johannes Thumshirn