03 Dec, 2006

2 commits


05 Sep, 2006

1 commit


20 Aug, 2006

1 commit

  • Stall error handler if attempting resets/aborts while an rport is blocked.
    This avoids device offline scenarios due to errors in the error handler.

    Background:
    Although the transport is using the scsi_timed_out functionality to
    restart the timeout if the rport is blocked, if the timeout has already
    fired before the block occurs, the eh handler still runs and can take
    the device offline. Ultimately, this window cannot be resolved without
    significant work in the error handler thread. Christoph noted the first
    level of these issues when he noted the poor error response handling
    by the error thread.

    We found, under heavy load and error testing, that time window from when
    the scsi_times_out() adds the io to the queue to when the scsi_error_handler
    gets around to servicing it, can be in the several seconds range. In most
    cases, these test conditions are highly unusual, but possible.
    As a result, we're stalling the error handler in this race window so that
    we can avoid the device_offline transitions.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     

07 Aug, 2006

1 commit


09 Jul, 2006

3 commits


27 Jun, 2006

1 commit


04 May, 2006

1 commit


06 Mar, 2006

1 commit


01 Mar, 2006

2 commits


13 Jan, 2006

1 commit


14 Dec, 2005

5 commits

  • Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • - Add functionality to run in polled mode only. Includes run time
    attribute to enable mode.
    - Enable runtime writable hba settings for coallescing and delay parameters

    Customers have requested a mode in the driver to run strictly polled.
    This is generally to support an environment where the server is extremely
    loaded and is looking to reclaim some cpu cycles from adapter interrupt
    handling.

    This patch adds a new "poll" attribute, and the following behavior:

    if value is 0 (default):
    The driver uses the normal method for i/o completion. It uses the
    firmware feature of interrupt coalesing. The firmware allows a
    minimum number of i/o completions before an interrupt, or a maximum
    time delay between interrupts. By default, the driver sets these
    to no delay (disabled) or 1 i/o - meaning coalescing is disabled.

    Attributes were provided to change the coalescing values, but it was
    a module-load time only and global across all adapters.
    This patch allows them to be writable on a per-adapter basis.

    if value is 1 :
    Interrupts are left enabled, expecting that the user has tuned the
    interrupt coalescing values. When this setting is enabled, the driver
    will attempt to service completed i/o whenever new i/o is submitted
    to the adapter. If the coalescing values are large, and the i/o
    generation rate steady, an interrupt will be avoided by servicing
    completed i/o prior to the coalescing thresholds kicking in. However,
    if the i/o completion load is high enough or i/o generation slow, the
    coalescion values will ensure that completed i/o is serviced in a timely
    fashion.

    if value is 3 :
    Turns off FCP i/o interrupts altogether. The coalescing values now have
    no effect. A new attribute "poll_tmo" (default 10ms) exists to set
    the polling interval for i/o completion. When this setting is enabled,
    the driver will attempt to service completed i/o and restart the
    interval timer whenever new i/o is submitted. This behavior allows for
    servicing of completed i/o sooner than the interval timer, but ensures
    that if no i/o is being issued, then the interval timer will kick in
    to service the outstanding i/o.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • - Release task management command before counting outstanding commands.
    TMF was being erroneously counted as an active outstanding command.
    - Serialize EH calls and block requests when EH function is running.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • Remove locking wrappers around error handlers. Wrappers were added in
    early 2.6.13 api change

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • Miscellaneous Cleanups:
    - Remove ProgType READ_REV mailbox command value check in lpfc_config_port_prep.
    - Convert simple printk to an lpfc_printf_log in queuecommand.
    - Modify lpfc_abort_handler message 0749 to display more accurate text and data.
    - Minor style cleanup: fix 3 long lines in lpfc_hw.h

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     

07 Nov, 2005

1 commit


29 Oct, 2005

5 commits

  • Return FAILED from eh_ routines if command(s) is(are) not completed

    There were scenarios where we may have returned from the error
    handlers prior to all affected commands being flushed to the midlayer.
    Add changes to ensure this doesn't happen.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • Adjust lpfc_scsi_buf allocation to account for lun_queue_depth and
    error handling

    Under high load and high duress, the error handler could steal some
    command resources from the normal i/o path. Rework to allocate
    additional resources to avoid this scneario.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • Replace lpfc_sli_issue_iocb_wait_high_priority with lpfc_sli_issue_iocb_wait.

    Simplify code paths, as there really wasn't a "priority"

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • From: James Smart

    There were scenarios where the error handlers could reuse an iotag
    value of an active io. Remove all possibility of this by
    pre-assigning iotag resources to command resources.

    Signed-off-by: James Smart

    Rejections fixed up and
    Signed-off-by: James Bottomley

    James Bottomley
     
  • We recently went back to implement a board reset. When we perform the
    reset, we wanted to tear down the internal data structures and rebuild
    them. Unfortunately, when it came to the rport structure, things were
    odd. If we deleted them, the scsi targets and sdevs would be
    torn down. Not a good thing for a temporary reset. We could block the
    rports, but we either maintain the internal structures to keep the
    rport reference (perhaps even replicating what's in the transport),
    or we have to fatten the fc transport with new search routines to find
    the rport (and deal with a case of a dangling rport that the driver
    forgets).

    It dawned on me that we had actually reached this state incorrectly.
    When the fc transport first started, we did the block/unblock first, then
    added the rport interface. The purpose of block/unblock is to hide the
    temporary disappearance of the rport (e.g. being deleted, then readded).
    Why are we making the driver do the block/unblock ? We should be making
    the transport have only an rport add/delete, and the let the transport
    handle the block/unblock.

    So... This patch removes the existing fc_remote_port_block/unblock
    functions. It moves the block/unblock functionality into the
    fc_remote_port_add/delete functions. Updates for the lpfc driver are
    included. Qlogic driver updates are also enclosed, thanks to the
    contributions of Andrew Vasquez. [Note: the qla2xxx changes are
    relative to the scsi-misc-2.6 tree as of this morning - which does
    not include the recent patches sent by Andrew]. The zfcp driver does
    not use the block/unblock functions.

    One last comment: The resulting behavior feels very clean. The LLDD is
    concerned only with add/delete, which corresponds to the physical
    disappearance. However, the fact that the scsi target and sdevs are
    not immediately torn down after the LLDD calls delete causes an
    interesting scenario... the midlayer can call the xxx_slave_alloc and
    xxx_queuecommand functions with a sdev that is at the location the
    rport used to be. The driver must validate the device exists when it
    first enters these functions. In thinking about it, this has always
    been the case for the LLDD and these routines. The existing drivers
    already check for existence. However, this highlights that simple
    validation via data structure dereferencing needs to be watched.
    To deal with this, a new transport function, fc_remote_port_chkready()
    was created that LLDDs should call when they first enter these two
    routines. It validates the rport state, and returns a scsi result
    which could be returned. In addition to solving the above, it also
    creates consistent behavior from the LLDD's when the block and deletes
    are occuring.

    Rejections fixed up and
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     

13 Aug, 2005

4 commits

  • Replace use of lpfc_put_lun with midlayer's int_to_scsilun

    Remove driver's local definition of lpfc_put_lun (which converts an
    int back to a 64-bit LUN) and replace it's use with the recently added
    int_to_scsilun function provided by the midlayer.

    Note: Embedding midlayer structure in our structure caused
    need for more files to include midlayer headers.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • Fix handling of the dev_loss and nodev timeouts.

    Symptoms: when remote port disappears for a period of time longer then
    either nodev_tmo or dev_loss_tmo, the lpfc driver worker thread will
    stall removing that remote port.

    Cause: removing remote port involves un-blocking and sync-ing
    corresponding block device queue. But corresponding node in the lpfc
    driver is still in the NPR(?node port recovery?) state and mid-layer
    gets SCSI_MLQUEUE_HOST_BUSY as a return value when it is trying to call
    queuecommand() with command for that node (AKA remote port)

    Fix: Instead of returning SCSI_MLQUEUE_HOST_BUS from queuecommand() for
    nodes in NPR states complete it with retry-able error code DID_BUS_BUSY

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • Clear task management bits when preparing SCSI commands

    In lpfc_scsi_prep_cmnd, clear the task management bits (fcpCntl2 member
    in the fcp_cmd structure) when preparing regular SCSI commands.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     
  • IOCB BDE not getting fully initialized during reuse

    Symptoms: Driver gets Status 3 and Reason 0x13 on IOCB completions.

    Cause: The IOCB bpl.bdeSize and bdeFlags are not getting initialized on reuse.

    Fix: Reinitialize these fields in prep_dma each time an IOCB is used.

    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James.Smart@Emulex.Com
     

03 Jul, 2005

4 commits


18 Jun, 2005

3 commits


19 Apr, 2005

1 commit