18 Dec, 2019
1 commit
-
commit 324e1c402069e8d277d2a2b18ce40bde1265b96a upstream.
In cases where I/O may be aborted, such as driver unload or link bounces,
the system will crash based on a bad ndlp pointer.Example:
RIP: 0010:lpfc_sli4_abts_err_handler+0x15/0x140 [lpfc]
...
lpfc_sli4_io_xri_aborted+0x20d/0x270 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.54+0x84/0x170 [lpfc]
lpfc_sli4_fp_handle_cqe+0xc2/0x480 [lpfc]
__lpfc_sli4_process_cq+0xc6/0x230 [lpfc]
__lpfc_sli4_hba_process_cq+0x29/0xc0 [lpfc]
process_one_work+0x14c/0x390Crash was caused by a bad ndlp address passed to I/O indicated by the XRI
aborted CQE. The address was not NULL so the routine deferenced the ndlp
ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an
erroneous io handler. Root cause for the bad ndlp was an lpfc_ncmd that
was aborted, put on the abort_io list, completed, taken off the abort_io
list, sent to lpfc_release_nvme_buf where it was put back on the abort_io
list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared
on the final completion.Rework the exchange busy handling to ensure the flags are properly set for
both scsi and nvme.Fixes: c490850a0947 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
Cc: # v5.1+
Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
20 Aug, 2019
1 commit
-
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
to contain the exchanges Scatter/Gather List. This caps the number of SGL
elements that can be in the SGL. There are not extensions to extend the
list out of the 2 pages.The G7 hardware adds a SGE type that allows the SGL to be vectored to a
different scatter/gather list segment. And that segment can contain a SGE
to go to another segment and so on. The initial segment must still be
pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
as it can now be dynamically grown. This much smaller allocation can
handle the SG list for most normal I/O, and the dynamic aspect allows it to
support many MB's if needed.The implementation creates a pool which contains "segments" and which is
initially sized to hold the initial small segment per xri. If an I/O
requires additional segments, they are allocated from the pool. If the
pool has no more segments, the pool is grown based on what is now
needed. After the I/O completes, the additional segments are returned to
the pool for use by other I/Os. Once allocated, the additional segments are
not released under the assumption of "if needed once, it will be needed
again". Pools are kept on a per-hardware queue basis, which is typically
1:1 per cpu, but may be shared by multiple cpus.The switch to the smaller initial allocation significantly reduces the
memory footprint of the driver (which only grows if large ios are
issued). Based on the several K of XRIs for the adapter, the 8KB->256B
reduction can conserve 32MBs or more.It has been observed with per-cpu resource pools that allocating a resource
on CPU A, may be put back on CPU B. While the get routines are distributed
evenly, only a limited subset of CPUs may be handling the put routines.
This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
all the resources are being put on a limited subset of CPUs.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
20 Mar, 2019
2 commits
-
The driver periodically checks for adapter error in a background thread. If
the thread detects an error, the adapter will be reset including the
deletion and reallocation of workqueues on the adapter. Simultaneously,
there may be a user-space request to offline the adapter which may try to
do many of the same steps, in parallel, on a different thread. As memory
was deallocated while unexpected, the parallel offline request hit a bad
pointer.Add coordination between the two threads. The error recovery thread has
precedence. So, when an error is detected, a flag is set on the adapter to
indicate the error thread is terminating the adapter. But, before doing
that work, it will look for a flag that is set by the offline flow, and if
set, will wait for it to complete before then processing the error handling
path. Similarly, in the offline thread, it first checks for whether the
error thread is resetting the adapter, and if so, will then wait for the
error thread to finish. Only after it has finished, will it set its flag
and offline the adapter.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen -
The debug ktime counters that trace an io were inadvertently not placed in
the common section of an io buffer. Thus, they generate an invalid opcode
error when accessed.Move the ktime counters into the common area.
Fixes: 0794d601d174 ("scsi: lpfc: Implement common IO buffers between NVME and SCSI")
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
06 Feb, 2019
4 commits
-
For files modified as part of 12.2.0.0 patches, update copyright to 2019
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
A scsi host lock is taken on every io completion to check whether the abort
handler is waiting on the io completion. This is an expensive lock to take
on all completion when rarely in an abort condition.Replace scsi host lock with command-specific lock. Synchronize completion
and abort paths by new cmd lock. Ensure all flag changing and nulling of
context pointers taken under lock. When adding lock to task management
abort, realized it was missing other synchronization locks. Added that
synchronization to match normal paths.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool. The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.On io completion, a cpu will push the XRI back on to its private pool. A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
Once the IO buff allocations were made shared, there was a single XRI
buffer list shared by all hardware queues. A single list isn't great for
performance when shared across the per-cpu hardware queues.Create a separate XRI IO buffer get/put list for each Hardware Queue. As
SGLs and associated IO buffers get allocated/posted to the firmware; round
robin their assignment across all available hardware Queues so that there
is an equitable assignment.Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic
for XRI allocation.Add a debugfs interface to display hardware queue statistics
Added new empty_io_bufs counter to track if a cpu runs out of XRIs.
Replace common_ variables/names with io_ to make meanings clearer.
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen
08 Dec, 2018
1 commit
-
The driver data structure for managing a mailbox command contained two
context fields. Unfortunately, the context were considered "generic" to be
used at the whim of the command code. Of course, one section of code used
fields this way, while another did it that way, and eventually there were
mixups.Refactored the structure so that the generic contexts become a node context
and a buffer context and all code standardizes on their use.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
11 Jul, 2018
1 commit
-
Change references from "Broadcom Limited" to "Broadcom Inc." in the
copyright message. Update copyright duration if not yet updated for 2018.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
27 Jun, 2018
1 commit
-
The get_seconds() function suffers from a possible overflow in 2038 or
2106, as well as jitter due to settimeofday or leap second updates, and is
deprecated.As we are interested in elapsed time only, using ktime_get_seconds() to
read the CLOCK_MONOTONIC timebase is ideal here. This also lets us remove
the hack that tries to deal with get_seconds() going slightly backwards,
which cannot happen with montonic timestamps.Signed-off-by: Arnd Bergmann
Reviewed-by: Johannes Thumshirn
Signed-off-by: Martin K. Petersen
13 Mar, 2018
2 commits
-
POST_SGL_PAGES mailbox command failed with status (timeout).
wait_event_interruptible_timeout when called from mailbox wait interface,
gets interrupted, and will randomly fail. Behavior seems very specific to 1
particular server type.Fix by changing from wait_event_interruptible_timeout to
wait_for_completion_timeout.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen -
The driver is very sloppy about the WQE structure passed between routines.
The base struct type is a 64byte wqe. But in many routines they typecast and
access 128byte wqes. There were a couple of cases in the past (corrected
already) where the typecasts were incorrectly done and the 64byte buffer was
accessed as a 128 byte buffer.Clean this up by properly declaring wqe's as 128byte wqe's and removing the
typecasts. 64byte wqes are considered a subset of the 128byte wqes.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
13 Jun, 2017
1 commit
-
Administrator intervention is currently required to get good numbers
when switching from running latency tests to IOPS tests.The configured interrupt coalescing values will greatly effect the
results of these tests. Currently, the driver has a single coalescing
value set by values of the module attribute. This patch changes the
driver to support auto-configuration of the coalescing value based on
the total number of outstanding IOs and average number of CQEs processed
per interrupt for an EQ. Values are checked every 5 seconds.The driver defaults to the automatic selection. Automatic selection can
be disabled by the new lpfc_auto_imax module_parameter.Older hardware can only change interrupt coalescing by mailbox
command. Newer hardware supports change via a register. The patch
support both.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
23 Feb, 2017
4 commits
-
Update copyrights to 2017 for all files touched in this patch set
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
NVME Target: Base modifications
This set of patches adds the base modifications for NVME target support
The base modifications consist of:
- Additional module parameters or configuration tuning
- Enablement of configuration mode for NVME target. Ties into the
queueing model put into place by the initiator basemods patches.
- Target-specific buffer pools, dma pools, sgl pools[mkp: fixed space at end of file]
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
NVME Initiator: Base modifications
This patch adds base modifications for NVME initiator support.
The base modifications consist of:
- Formal split of SLI3 rings from SLI-4 WQs (sometimes referred to as
rings as well) as implementation now widely varies between the two.
- Addition of configuration modes:
SCSI initiator only; NVME initiator only; NVME target only; and
SCSI and NVME initiator.
The configuration mode drives overall adapter configuration,
offloads enabled, and resource splits.
NVME support is only available on SLI-4 devices and newer fw.
- Implements the following based on configuration mode:
- Exchange resources are split by protocol; Obviously, if only
1 mode, then no split occurs. Default is 50/50. module attribute
allows tuning.
- Pools and config parameters are separated per-protocol
- Each protocol has it's own set of queues, but share interrupt
vectors.
SCSI:
SLI3 devices have few queues and the original style of queue
allocation remains.
SLI4 devices piggy back on an "io-channel" concept that
eventually needs to merge with scsi-mq/blk-mq support (it is
underway). For now, the paradigm continues as it existed
prior. io channel allocates N msix and N WQs (N=4 default)
and either round robins or uses cpu # modulo N for scheduling.
A bunch of module parameters allow the configuration to be
tuned.
NVME (initiator):
Allocates an msix per cpu (or whatever pci_alloc_irq_vectors
gets)
Allocates a WQ per cpu, and maps the WQs to msix on a WQ #
modulo msix vector count basis.
Module parameters exist to cap/control the config if desired.
- Each protocol has its own buffer and dma pools.I apologize for the size of the patch.
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart----
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
This contains code cleanups that were in the prior patch set.
This allows better review of real changes later.minor code cleanups:
fix indentation, punctuation, line length
addition/reduction of whitespace
remove unneeded parens, braces
lpfc_debugfs_nodelist_data: print as u64 rather than byte by byte
covert printk(KERN_ERR to pr_err
small print string deltas
use num_present_cpus() rather than count them
comment updates
rctl/type names moved to module variable, not on stackSigned-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen
16 Jul, 2016
2 commits
-
Copyright updates
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen -
Add support for XLane LUN priority
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Martin K. Petersen
10 Apr, 2015
2 commits
-
Update copyright to 2015
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: James Bottomley -
Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: James Bottomley
17 Sep, 2014
1 commit
-
Fix locking issues with abort data paths
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Christoph Hellwig
03 Jun, 2014
1 commit
-
Update Copyright on changed files from 8.3.45 patches
Missed this in the 8.3.45 push
Signed-off-by: James Smart
Reviewed-By: Dick Kennedy
Signed-off-by: Christoph Hellwig
16 Mar, 2014
1 commit
-
Signed-off-by: James Smart
Signed-off-by: James Bottomley
11 Sep, 2013
2 commits
-
Signed-off-by: James Smart
Signed-off-by: James Bottomley -
Signed-off-by: James Smart
Signed-off-by: James Bottomley
24 Aug, 2013
2 commits
-
Signed-off-by: James Smart
Signed-off-by: James Bottomley -
Signed-off-by: James Smart
Signed-off-by: James Bottomley
27 Nov, 2012
1 commit
-
Signed-off-by: James Smart
Signed-off-by: James Bottomley
14 Sep, 2012
3 commits
-
Signed-off-by: James Smart
Signed-off-by: James Bottomley -
Commonize SLI-3/4 Ring/Queue framework, to keep SLI-3 compatibility
Parallelize SLI-4 Q distribution - to use multiple posting/completion queuesSigned-off-by: James Smart
Signed-off-by: James Bottomley -
Signed-off-by: James Smart
Signed-off-by: James Bottomley
17 May, 2012
1 commit
-
Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley
19 Feb, 2012
1 commit
-
T10 Diff fixes and enhancements:
- Add SLI4 Lancer support for T10 DIF / BlockGuard (121980)
- Fix SLI4 BlockGuard behavior when protection data is generated by HBA (121980)
- Enhance debugfs for injecting T10 DIF errors (123966, 132966)
- Fix Incorrect usage of bghm for BlockGuard errors (127022)Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley
17 Oct, 2011
1 commit
-
Changed the timeout value for flash-based SLI_CONFIG (0x9B)
mailbox command to 300 seconds for worst case flash delays.Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley
27 May, 2011
1 commit
-
This patch adds support for hardware that returns resource ids via
extents rather than contiguous ranges.[jejb: checkpatch.pl fixes]
Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley
22 Dec, 2010
2 commits
-
Implement the FC and SLI async event handlers:
- Updated MQ_CREATE_EXT mailbox structure to include fc and SLI async events.
- Added the SLI trailer code.
- Split physical field into type and number to reflect latest SLI spec.
- Changed lpfc_acqe_fcoe to lpfc_acqe_fip to reflect latest Spec changes.
- Added lpfc_acqe_fc_la structure for FC link attention async events.
- Added lpfc_acqe_sli structure for sli async events.
- Added lpfc_sli4_async_fc_evt routine to handle fc la async events.
- Added lpfc_sli4_async_sli routine to handle sli async events.
- Moved LPFC_TRAILER_CODE_FC to be handled by its own handler function.Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley -
Added support for ELS RRQ command
- Add new routine lpfc_set_rrq_active() to track XRI qualifier state.
- Add new module parameter lpfc_enable_rrq to control RRQ operation.
- Add logic to ELS RRQ completion handler and xri qualifier timeout
to clear XRI qualifier state.
- Use OX_ID from XRI_ABORTED_CQE for RRQ payload.
- Tie abort and XRI_ABORTED_CQE andler to RRQ generation.Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley
28 Jul, 2010
1 commit
-
Signed-off-by: Alex Iannicelli
Signed-off-by: James Smart
Signed-off-by: James Bottomley