Eric Lee / smarc-fsl-linux-kernel

21 Sep, 2021

3 commits

bdaa13656 nvme-fc: remove freeze/unfreeze around update_nr_hw_queues ... Browse Code »

Remove the freeze/unfreeze around changes to the number of hardware
queues. Study and retest has indicated there are no ios that can be
active at this point so there is nothing to freeze.

nvme-fc is draining the queues in the shutdown and error recovery path
in __nvme_fc_abort_outstanding_ios.

This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to
freeze before calling update_hr_hw_queues". It's not an exact revert as
it leaves the adjusting of hw queues only if the count changes.

Signed-off-by: James Smart
[dwagner: added explanation why no IO is pending]
Signed-off-by: Daniel Wagner
Reviewed-by: Ming Lei
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2021-09-21 15:17:12 +0800
e5445dae2 nvme-fc: avoid race between time out and tear down ... Browse Code »

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue.

This patch merges the admin and io sync ops into the queue teardown logic
as shown in the RDMA patch 3017013dcc "nvme-rdma: avoid race between time
out and tear down". There is no teardown_lock in nvme-fc.

Signed-off-by: James Smart
Tested-by: Daniel Wagner
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Signed-off-by: Christoph Hellwig

James Smart
2021-09-21 15:17:12 +0800
555f66d0f nvme-fc: update hardware queues before using them ... Browse Code »

In case the number of hardware queues changes, we need to update the
tagset and the mapping of ctx to hctx first.

If we try to create and connect the I/O queues first, this operation
will fail (target will reject the connect call due to the wrong number
of queues) and hence we bail out of the recreate function. Then we
will to try the very same operation again, thus we don't make any
progress.

Signed-off-by: Daniel Wagner
Reviewed-by: Ming Lei
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Reviewed-by: James Smart
Signed-off-by: Christoph Hellwig

Daniel Wagner
2021-09-21 15:17:09 +0800

10 Jul, 2021

1 commit

a022f7d57 Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:
"A combination of changes that ended up depending on both the driver
and core branch (and/or the IDE removal), and a few late arriving
fixes. In detail:

- Fix io ticks wrap-around issue (Chunguang)

- nvme-tcp sock locking fix (Maurizio)

- s390-dasd fixes (Kees, Christoph)

- blk_execute_rq polling support (Keith)

- blk-cgroup RCU iteration fix (Yu)

- nbd backend ID addition (Prasanna)

- Partition deletion fix (Yufen)

- Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)

- Removal of now dead block request types due to IDE removal
(Christoph)

- Loop probing and control device cleanups (Christoph)

- Device uevent fix (Christoph)

- Misc cleanups/fixes (Tetsuo, Christoph)"

* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
block: fix the problem of io_ticks becoming smaller
nvme-tcp: can't set sk_user_data without write_lock
loop: remove unused variable in loop_set_status()
block: remove the bdgrab in blk_drop_partitions
block: grab a device refcount in disk_uevent
s390/dasd: Avoid field over-reading memcpy()
dasd: unexport dasd_set_target_state
block: check disk exist before trying to add partition
ubd: remove dead code in ubd_setup_common
nvme: use return value from blk_execute_rq()
block: return errors from blk_execute_rq()
nvme: use blk_execute_rq() for passthrough commands
block: support polling through blk_execute_rq
block: remove REQ_OP_SCSI_{IN,OUT}
block: mark blk_mq_init_queue_data static
loop: rewrite loop_exit using idr_for_each_entry
loop: split loop_lookup
loop: don't allow deleting an unspecified loop device
loop: move loop_ctl_mutex locking into loop_add
...

Linus Torvalds
2021-07-10 03:05:33 +0800

03 Jul, 2021

1 commit

bd31b9efb Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.

The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...

Linus Torvalds
2021-07-03 06:14:36 +0800

01 Jul, 2021

2 commits

be42a33b9 nvme: use blk_execute_rq() for passthrough commands ... Browse Code »

The generic blk_execute_rq() knows how to handle polled completions. Use
that instead of implementing an nvme specific handler.

Signed-off-by: Keith Busch
Reviewed-by: Christoph Hellwig
Reviewed-by: Chaitanya Kulkarni
Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org
Signed-off-by: Jens Axboe

Keith Busch
2021-07-01 05:35:38 +0800
440462198 Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"Pretty calm round, mostly just NVMe and a bit of MD:

- NVMe updates (via Christoph)
- improve the APST configuration algorithm (Alexey Bogoslavsky)
- look for StorageD3Enable on companion ACPI device
(Mario Limonciello)
- allow selecting the network interface for TCP connections
(Martin Belanger)
- misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King,
Christoph)
- move the ACPI StorageD3 code to drivers/acpi/ and add quirks
for certain AMD CPUs (Mario Limonciello)
- zoned device support for nvmet (Chaitanya Kulkarni)
- fix the rules for changing the serial number in nvmet
(Noam Gottlieb)
- various small fixes and cleanups (Dan Carpenter, JK Kim,
Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert
Uytterhoeven, Daniel Wagner)

- MD updates (Via Song)
- iostats rewrite (Guoqing Jiang)
- raid5 lock contention optimization (Gal Ofri)

- Fall through warning fix (Gustavo)

- Misc fixes (Gustavo, Jiapeng)"

* tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits)
nvmet: use NVMET_MAX_NAMESPACES to set nn value
loop: Fix missing discard support when using LOOP_CONFIGURE
nvme.h: add missing nvme_lba_range_type endianness annotations
nvme: remove zeroout memset call for struct
nvme-pci: remove zeroout memset call for struct
nvmet: remove zeroout memset call for struct
nvmet: add ZBD over ZNS backend support
nvmet: add Command Set Identifier support
nvmet: add nvmet_req_bio put helper for backends
nvmet: add req cns error complete helper
block: export blk_next_bio()
nvmet: remove local variable
nvmet: use nvme status value directly
nvmet: use u32 type for the local variable nsid
nvmet: use u32 for nvmet_subsys max_nsid
nvmet: use req->cmd directly in file-ns fast path
nvmet: use req->cmd directly in bdev-ns fast path
nvmet: make ver stable once connection established
nvmet: allow mn change if subsys not discovered
nvmet: make sn stable once connection was established
...

Linus Torvalds
2021-07-01 03:21:16 +0800

17 Jun, 2021

1 commit

b61678bcd nvme-fc: use ctrl sgl check helper ... Browse Code »

Use the helper to check NVMe controller's SGL support.

Reviewed-by: James Smart
Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Chaitanya Kulkarni
2021-06-17 21:51:18 +0800

10 Jun, 2021

1 commit

3dbbca75e scsi: nvme: Added a new sysfs attribute appid_store ... Browse Code »

Add a new sysfs attribute, appid_store, which can be used to set the
application identifier in the blkcg associated with a cgroup id.

Below is the interface provided to set the app_id:

echo ":" >> /sys/class/fc/fc_udev_device/appid_store

echo "457E:100000109b521d27" >> /sys/class/fc/fc_udev_device/appid_store

Link: https://lore.kernel.org/r/20210608043556.274139-4-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Muneendra Kumar
Signed-off-by: Martin K. Petersen

Muneendra Kumar
2021-06-10 22:01:32 +0800

25 May, 2021

1 commit

f25f8ef70 nvme-fc: short-circuit reconnect retries ... Browse Code »

Returning an nvme status from nvme_fc_create_association() indicates
that the association is established, and we should honour the DNR bit.
If it's set a reconnect attempt will just return the same error, so
we can short-circuit the reconnect attempts and fail the connection
directly.

Signed-off-by: Hannes Reinecke
Reviewed-by: Sagi Grimberg
Reviewed-by: Himanshu Madhani
Reviewed-by: James Smart
Signed-off-by: Christoph Hellwig

Hannes Reinecke
2021-05-25 15:21:15 +0800

19 May, 2021

1 commit

a7d139145 nvme-fc: clear q_live at beginning of association teardown ... Browse Code »

The __nvmf_check_ready() routine used to bounce all filesystem io if the
controller state isn't LIVE. However, a later patch changed the logic so
that it rejection ends up being based on the Q live check. The FC
transport has a slightly different sequence from rdma and tcp for
shutting down queues/marking them non-live. FC marks its queue non-live
after aborting all ios and waiting for their termination, leaving a
rather large window for filesystem io to continue to hit the transport.
Unfortunately this resulted in filesystem I/O or applications seeing I/O
errors.

Change the FC transport to mark the queues non-live at the first sign of
teardown for the association (when I/O is initially terminated).

Fixes: 73a5379937ec ("nvme-fabrics: allow to queue requests for live queues")
Signed-off-by: James Smart
Reviewed-by: Sagi Grimberg
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2021-05-19 14:40:24 +0800

04 May, 2021

1 commit

a97157440 nvme: move the fabrics queue ready check routines to core ... Browse Code »

queue_rq() in pci only checks if the dispatched queue (nvmeq) is ready,
e.g. not being suspended. Since nvme_alloc_admin_tags() in reset flow
restarts the admin queue, users are able to submit admin commands to a
controller before reset_work() completes. Commands submitted under this
condition may interfere with commands that performs identify, IO queue
setup in reset_work(), and may result in a hang described in the
following patch.

As seen in the fabrics, user commands are prevented from being executed
under inproper controller states. We may reuse this logic to maintain a
clear admin queue during reset_work().

Signed-off-by: Tao Chiu
Signed-off-by: Cody Wong
Reviewed-by: Leon Chien
Reviewed-by: Keith Busch
Signed-off-by: Christoph Hellwig

Tao Chiu
2021-05-04 15:35:49 +0800

29 Apr, 2021

1 commit

fc0586062 Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:

- MD changes via Song:
- raid5 POWER fix
- raid1 failure fix
- UAF fix for md cluster
- mddev_find_or_alloc() clean up
- Fix NULL pointer deref with external bitmap
- Performance improvement for raid10 discard requests
- Fix missing information of /proc/mdstat

- rsxx const qualifier removal (Arnd)

- Expose allocated brd pages (Calvin)

- rnbd via Gioh Kim:
- Change maintainer
- Change domain address of maintainers' email
- Add polling IO mode and document update
- Fix memory leak and some bug detected by static code analysis
tools
- Code refactoring

- Series of floppy cleanups/fixes (Denis)

- s390 dasd fixes (Julian)

- kerneldoc fixes (Lee)

- null_blk double free (Lv)

- null_blk virtual boundary addition (Max)

- Remove xsysace driver (Michal)

- umem driver removal (Davidlohr)

- ataflop fixes (Dan)

- Revalidate disk removal (Christoph)

- Bounce buffer cleanups (Christoph)

- Mark lightnvm as deprecated (Christoph)

- mtip32xx init cleanups (Shixin)

- Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang)

* tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits)
async_xor: increase src_offs when dropping destination page
drivers/block/null_blk/main: Fix a double free in null_init.
md/raid1: properly indicate failure when ending a failed write request
md-cluster: fix use-after-free issue when removing rdev
nvme: introduce generic per-namespace chardev
nvme: cleanup nvme_configure_apst
nvme: do not try to reconfigure APST when the controller is not live
nvme: add 'kato' sysfs attribute
nvme: sanitize KATO setting
nvmet: avoid queuing keep-alive timer if it is disabled
brd: expose number of allocated pages in debugfs
ataflop: fix off by one in ataflop_probe()
ataflop: potential out of bounds in do_format()
drbd: Fix fall-through warnings for Clang
block/rnbd: Use strscpy instead of strlcpy
block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name
block/rnbd-clt: Remove max_segment_size
block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes
block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev
Documentation/ABI/rnbd-clt: Add description for nr_poll_queues
...

Linus Torvalds
2021-04-29 05:39:37 +0800

03 Apr, 2021

4 commits

8df1bff57 nvme-fc: check sgl supported by target ... Browse Code »

SGLs support is mandatory for NVMe/FC, make sure that the target is
aligned to the specification.

Signed-off-by: Max Gurtovoy
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Max Gurtovoy
2021-04-03 00:48:29 +0800
f4b9e6c90 nvme: use driver pdu command for passthrough ... Browse Code »

All nvme transport drivers preallocate an nvme command for each request.
Assume to use that command for nvme_setup_cmd() instead of requiring
drivers pass a pointer to it. All nvme drivers must initialize the
generic nvme_request 'cmd' to point to the transport's preallocated
nvme_command.

The generic nvme_request cmd pointer had previously been used only as a
temporary copy for passthrough commands. Since it now points to the
command that gets dispatched, passthrough commands must directly set it
up prior to executing the request.

Signed-off-by: Keith Busch
Reviewed-by: Jens Axboe
Reviewed-by: Himanshu Madhani
Signed-off-by: Christoph Hellwig

Keith Busch
2021-04-03 00:48:27 +0800
2afc4866c nvme-fc: fix the function documentation comment ... Browse Code »

The nvme_fc_rcv_ls_req() function has first argument as pointer to
remoteport named portprt, but in the documentation comment that is name
is used as remoteport. Fix that to get rid if the compilation warning.

drivers/nvme//host/fc.c:1724: warning: Function parameter or member 'portptr' not described in 'nvme_fc_rcv_ls_req'
drivers/nvme//host/fc.c:1724: warning: Excess function parameter 'remoteport' description in 'nvme_fc_rcv_ls_req'

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: James Smart
Signed-off-by: Christoph Hellwig

Chaitanya Kulkarni
2021-04-03 00:48:27 +0800
f21c4769d nvme: rename nvme_init_identify() ... Browse Code »

This is a prep patch so that we can move the identify data structure
related code initialization from nvme_init_identify() into a helper.

Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish().

Next patch will move the nvme_id_ctrl related initialization from newly
renamed function nvme_init_ctrl_finish() into the nvme_init_identify()
helper.

Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Chaitanya Kulkarni
2021-04-03 00:48:26 +0800

18 Mar, 2021

1 commit

ed01fee28 nvme-fabrics: only reserve a single tag ... Browse Code »

Fabrics drivers currently reserve two tags on the admin queue. But
given that the connect command is only run on a freshly created queue
or after all commands have been force aborted we only need to reserve
a single tag.

Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner

Christoph Hellwig
2021-03-18 12:38:48 +0800

11 Mar, 2021

3 commits

f20ef34d7 nvme-fc: fix racing controller reset and create association ... Browse Code »

Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in
interrupt context results in a possible race condition. A controller
reset results in errored io completions, which schedules error
work. The change of error work to a work element allows it to fire
after the ctrl state transition to NVME_CTRL_CONNECTING, causing
any outstanding io (used to initialize the controller) to fail and
cause problems for connect_work.

Add a state check to only schedule error work if not in the RESETTING
state.

Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Signed-off-by: Nigel Kirkland
Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2021-03-11 18:48:35 +0800
ae3afe630 nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted ... Browse Code »

When a command has been aborted we should return NVME_SC_HOST_ABORTED_CMD
to be consistent with the other transports.

Signed-off-by: Hannes Reinecke
Reviewed-by: Sagi Grimberg
Reviewed-by: James Smart
Reviewed-by: Daniel Wagner
Signed-off-by: Christoph Hellwig

Hannes Reinecke
2021-03-11 18:48:34 +0800
3c7aafbc8 nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange() ... Browse Code »

nvme_fc_terminate_exchange() is being called when exchanges are
being deleted, and as such we should be setting the NVME_REQ_CANCELLED
flag to have identical behaviour on all transports.

Signed-off-by: Hannes Reinecke
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Reviewed-by: James Smart
Reviewed-by: Daniel Wagner
Signed-off-by: Christoph Hellwig

Hannes Reinecke
2021-03-11 18:48:34 +0800

02 Feb, 2021

1 commit

60b152a50 nvme: constify static attribute_group structs ... Browse Code »

The only usage of these is to put their addresses in arrays of pointers
to const attribute_groups. Make them const to allow the compiler to put
them in read-only memory.

Signed-off-by: Rikard Falkeborn
Signed-off-by: Christoph Hellwig

Rikard Falkeborn
2021-02-02 17:26:10 +0800

06 Jan, 2021

1 commit

19fce0470 nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context ... Browse Code »

Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
used to be called from a timeout or work context. Now it is being called
in an io completion context, which can be an interrupt handler.
Unfortunately, the abort outstanding ios routine attempts to stop nvme
queues and nested routines that may try to sleep, which is in conflict
with the interrupt handler.

Correct replacing the direct call with a work element scheduling, and the
abort outstanding ios routine will be called in the work element.

Fixes: 95ced8a2c72d ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
Signed-off-by: James Smart
Reported-by: Daniel Wagner
Tested-by: Daniel Wagner
Signed-off-by: Christoph Hellwig

James Smart
2021-01-06 17:30:36 +0800

02 Dec, 2020

1 commit

dc96f9387 nvme: use consistent macro name for timeout ... Browse Code »

This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to
make consistent with NVME_IO_TIMEOUT.

Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Chaitanya Kulkarni
2020-12-02 03:36:35 +0800

27 Oct, 2020

4 commits

ac9b820e7 nvme-fc: remove nvme_fc_terminate_io() ... Browse Code »

__nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
Consoldate and move the functionality of terminate_io into reset_work.

In reset_work, rather than calling the create_association directly,
schedule the connect work element to do its thing. After scheduling,
flush the connect work element to continue with semantic of not
returning until connect has been attempted at least once.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:02:29 +0800
95ced8a2c nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery ... Browse Code »

nvme_fc_error_recovery() special cases handling when in CONNECTING state
and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
special cases CONNECTING state and calls the routine to abort outstanding
ios.

Simplify the sequence by putting the call to abort outstanding I/Os
directly in nvme_fc_error_recovery.

Move the location of __nvme_fc_abort_outstanding_ios(), and
nvme_fc_terminate_exchange() which is called by it, to avoid adding
function prototypes for nvme_fc_error_recovery().

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:02:08 +0800
9c2bb2577 nvme-fc: remove err_work work item ... Browse Code »

err_work was created to handle errors (mainly I/O timeouts) while in
CONNECTING state. The flag for err_work_active is also unneeded.

Remove err_work_active and err_work. The actions to abort I/Os are moved
inline to nvme_error_recovery().

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:01:39 +0800
caf1cbe36 nvme-fc: track error_recovery while connecting ... Browse Code »

Whenever there are errors during CONNECTING, the driver recovers by
aborting all outstanding ios and counts on the io completion to fail them
and thus the connection/association they are on. However, the connection
failure depends on a failure state from the core routines. Not all
commands that are issued by the core routine are guaranteed to cause a
failure of the core routine. They may be treated as a failure status and
the status is then ignored.

As such, whenever the transport enters error_recovery while CONNECTING,
it will set a new flag indicating an association failed. The
create_association routine which creates and initializes the controller,
will monitor the state of the flag as well as the core routine error
status and ensure the association fails if there was an error.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:01:30 +0800

23 Oct, 2020

4 commits

f673714a1 nvme-fc: shorten reconnect delay if possible for FC ... Browse Code »

We've had several complaints about a 10s reconnect delay (the default)
when there was an error while there is connectivity to a subsystem.
The max_reconnects and reconnect_delay are set in common code prior to
calling the transport to create the controller.

This change checks if the default reconnect delay is being used, and if
so, it adjusts it to a shorter period (2s) for the nvme-fc transport.
It does so by calculating the controller loss tmo window, changing the
value of the reconnect delay, and then recalculating the maximum number
of reconnect attempts allowed.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:54:45 +0800
88e837ed0 nvme-fc: wait for queues to freeze before calling update_hr_hw_queues ... Browse Code »

On reconnect, the code currently does not freeze the controller before
possibly updating the number hw queues for the controller.

Add the freeze before updating the number of hw queues. Note: the queues
are already started and remain started through the reconnect.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:54:36 +0800
514a6dc9e nvme-fc: fix error loop in create_hw_io_queues ... Browse Code »

The loop that backs out of hw io queue creation continues through index
0, which corresponds to the admin queue as well.

Fix the loop so it only proceeds through indexes 1..n which correspond to
I/O queues.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:54:23 +0800
52793d62a nvme-fc: fix io timeout to abort I/O ... Browse Code »

Currently, an I/O timeout unconditionally invokes
nvme_fc_error_recovery() which checks for LIVE or CONNECTING state. If
live, the routine resets the controller which initiates a reconnect -
which is valid. If CONNECTING, err_work is scheduled. Err_work then
calls the terminate_io routine, which also checks for CONNECTING and
noops any further action on outstanding I/O. The result is nothing
happened to the timed out io. As such, if the command was dropped on
the wire, it will never timeout / complete, and the connect process
will hang.

Change the behavior of the io timeout routine to unconditionally abort
the I/O. I/O completion handling will note that an io failed due to an
abort and will terminate the connection / association as needed. If the
abort was unable to happen, continue with a call to
nvme_fc_error_recovery(). To ensure something different happens in
nvme_fc_error_recovery() rework it so at it will abort all I/Os on the
association to force a failure.

As I/O aborts now may occur outside of delete_association, counting for
completion must be wary and only count those aborted during
delete_association when TERMIO is set on the controller.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:52:16 +0800

22 Sep, 2020

1 commit

9e0e8dac9 nvme-fc: fail new connections to a deleted host or remote port ... Browse Code »

The lldd may have made calls to delete a remote port or local port and
the delete is in progress when the cli then attempts to create a new
controller. Currently, this proceeds without error although it can't be
very successful.

Fix this by validating that both the host port and remote port are
present when a new controller is to be created.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Signed-off-by: Christoph Hellwig

James Smart
2020-09-22 23:49:55 +0800

09 Sep, 2020

1 commit

e126e8210 nvme-fc: cancel async events before freeing event struct ... Browse Code »

Cancel async event work in case async event has been queued up, and
nvme_fc_submit_async_event() runs after event has been freed.

Signed-off-by: David Milburn
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

David Milburn
2020-09-09 01:46:29 +0800

22 Aug, 2020

2 commits

2eb81a336 nvme: rename and document nvme_end_request ... Browse Code »

nvme_end_request is a bit misnamed, as it wraps around the
blk_mq_complete_* API. It's semantics also are non-trivial, so give it
a more descriptive name and add a comment explaining the semantics.

Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Reviewed-by: Mike Snitzer
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-08-22 07:14:28 +0800
f34448cd0 nvme-fc: Fix wrong return value in __nvme_fc_init_request() ... Browse Code »

On an error exit path, a negative error code should be returned
instead of a positive return value.

Fixes: e399441de9115 ("nvme-fabrics: Add host support for FC transport")
Cc: James Smart
Signed-off-by: Tianjia Zhang
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Tianjia Zhang
2020-08-22 07:14:27 +0800

29 Jul, 2020

2 commits

237480760 nvme-fc: set max_segments to lldd max value ... Browse Code »

Currently the FC transport is set max_hw_sectors based on the lldds
max sgl segment count. However, the block queue max segments is
set based on the controller's max_segments count, which the transport
does not set. As such, the lldd is receiving sgl lists that are
exceeding its max segment count.

Set the controller max segment count and derive max_hw_sectors from
the max segment count.

Signed-off-by: James Smart
Reviewed-by: Max Gurtovoy
Reviewed-by: Himanshu Madhani
Reviewed-by: Ewan D. Milne
Signed-off-by: Christoph Hellwig

James Smart
2020-07-29 13:45:20 +0800
ecca390e8 nvme: fix deadlock in disconnect during scan_work and/or ana_work ... Browse Code »

A deadlock happens in the following scenario with multipath:
1) scan_work(nvme0) detects a new nsid while nvme0
is an optimized path to it, path nvme1 happens to be
inaccessible.

2) Before scan_work is complete nvme0 disconnect is initiated
nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING

3) scan_work(1) attempts to submit IO,
but nvme_path_is_optimized() observes nvme0 is not LIVE.
Since nvme1 is a possible path IO is requeued and scan_work hangs.

--
Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: efi_partition+0x1e6/0x708
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
--

4) Delete also hangs in flush_work(ctrl->scan_work)
from nvme_remove_namespaces().

Similiarly a deadlock with ana_work may happen: if ana_work has started
and calls nvme_mpath_set_live and device_add_disk, it will
trigger I/O. When we trigger disconnect I/O will block because
our accessible (optimized) path is disconnecting, but the alternate
path is inaccessible, so I/O blocks. Then disconnect tries to flush
the ana_work and hangs.

[ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
[ 605.552087] Call Trace:
[ 605.552683] __schedule+0x2b9/0x6c0
[ 605.553507] schedule+0x42/0xb0
[ 605.554201] io_schedule+0x16/0x40
[ 605.555012] do_read_cache_page+0x438/0x830
[ 605.556925] read_cache_page+0x12/0x20
[ 605.557757] read_dev_sector+0x27/0xc0
[ 605.558587] amiga_partition+0x4d/0x4c5
[ 605.561278] check_partition+0x154/0x244
[ 605.562138] rescan_partitions+0xae/0x280
[ 605.563076] __blkdev_get+0x40f/0x560
[ 605.563830] blkdev_get+0x3d/0x140
[ 605.564500] __device_add_disk+0x388/0x480
[ 605.565316] device_add_disk+0x13/0x20
[ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core]
[ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
[ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core]
[ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core]
[ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core]
[ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core]
[ 605.573330] process_one_work+0x1db/0x380
[ 605.574144] worker_thread+0x4d/0x400
[ 605.574896] kthread+0x104/0x140
[ 605.577205] ret_from_fork+0x35/0x40
[ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
[ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830
[ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.582320] nvme D 0 14044 14043 0x00000000
[ 605.583424] Call Trace:
[ 605.583935] __schedule+0x2b9/0x6c0
[ 605.584625] schedule+0x42/0xb0
[ 605.585290] schedule_timeout+0x203/0x2f0
[ 605.588493] wait_for_completion+0xb1/0x120
[ 605.590066] __flush_work+0x123/0x1d0
[ 605.591758] __cancel_work_timer+0x10e/0x190
[ 605.593542] cancel_work_sync+0x10/0x20
[ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core]
[ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core]
[ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
[ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core]
[ 605.598320] dev_attr_store+0x17/0x30

Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
indicate the phase of controller deletion where I/O cannot be allowed
to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
be issued to the bottom device, and only after we flush the ana_work
and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
from re-firing by aborting early if we are not LIVE, so we should be safe
here.

In addition, change the transport drivers to follow the updated state
machine.

Fixes: 0d0b660f214d ("nvme: add ANA support")
Reported-by: Anton Eidelman
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Sagi Grimberg
2020-07-29 13:45:19 +0800

24 Jun, 2020

1 commit

ff0294514 nvme: use blk_mq_complete_request_remote to avoid an indirect function call ... Browse Code »

Use the new blk_mq_complete_request_remote helper to avoid an indirect
function call in the completion fast path.

Reviewed-by: Daniel Wagner
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-24 23:15:57 +0800

11 Jun, 2020

1 commit

c9c12e51b nvme-fc: don't call nvme_cleanup_cmd() for AENs ... Browse Code »

Asynchronous event notifications do not have an associated request.
When fcp_io() fails we unconditionally call nvme_cleanup_cmd() which
leads to a crash.

Fixes: 16686f3a6c3c ("nvme: move common call to nvme_cleanup_cmd to core layer")
Signed-off-by: Daniel Wagner
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Reviewed-by: James Smart
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Daniel Wagner
2020-06-11 23:10:05 +0800