14 Sep, 2008

1 commit

  • Josip Rodin noted
    (http://article.gmane.org/gmane.linux.ports.sparc/10152) the
    driver oopsing during registration of an rport to the
    FC-transport layer with a backtrace indicating a dereferencing of
    an shost->shost_data equal to NULL. David Miller identified a
    small window in driver logic where this could happen:

    > Look at how the driver registers the IRQ handler before the host has
    > been registered with the SCSI layer.
    >
    > That leads to a window of time where the shost hasn't been setup
    > fully, yet ISRs can come in and trigger DPC thread events, such as
    > loop resyncs, which expect the transport area to be setup.
    >
    > But it won't be setup, because scsi_add_host() hasn't finished yet.
    >
    > Note that in Josip's crash log, we don't even see the
    >
    > qla_printk(KERN_INFO, ha, "\n"
    > " QLogic Fibre Channel HBA Driver: %s\n"
    > " QLogic %s - %s\n"
    > " ISP%04X: %s @ %s hdma%c, host#=%ld, fw=%s\n",
    > ...
    >
    > message yet.
    >
    > Which means that the crash occurs between qla2x00_request_irqs()
    > and printing that message.

    Close this window by enabling RISC interrupts after the host has
    been registered with the SCSI midlayer.

    Reported-by: Josip Rodin
    Cc: Stable Tree
    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     

24 Aug, 2008

1 commit


16 Aug, 2008

7 commits

  • Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • During internal testing, we've seen issues (hangs) with the
    'deferred' vport tear-down-processing typically accompanied with
    the fc_remove_host() call. This is due to the current
    implementation's back-end vport handling being performed by the
    physical-HA's DPC thread where premature shutdown could lead to
    latent vport requests without a processor.

    This should also address a problem reported by Gal Rosen
    (http://marc.info/?l=linux-scsi&m=121731664417358&w=2) where the
    driver would attempt to awaken a previously torn-down DPC thread
    from interrupt context by implicitly calling wake_up_process()
    rather than the driver's qla2xxx_wake_dpc() helper. Rather, than
    reshuffle the remove_one() device-removal code, during unload,
    depend on the driver's timer to wake-up the DPC process, by
    limiting wake-ups based on an 'unloading' flag.

    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • The executing-HA of an SRB can be referenced from the sp->fcport.
    Use this correct value while processing status-continuation data
    and abort processing.

    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Mike Hernandez
     
  • Original code inadvertently cleared an SRB's 'flags' while
    aborting; causing a follow-on scsi_dma_unmap() to be potentially
    missed.

    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • * Use correct 'ha' to mark a device lost from ISR.
    I/Os will always be returned on the physical-HA.
    qla2x00_mark_device_lost() should be called with the HA bound
    to the fcport.
    * Mark *all* devices lost during ISP-ABORT (bighammer).

    These fixes correct issues discovered locally where during
    link-perturbation and heavy vport-I/O fcport/rport states would
    stray and an rport's scsi-target lost (timed-out).

    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • Greg Wettstein (greg@enjellic.com) noted:

    http://article.gmane.org/gmane.linux.scsi/43409

    on a reboot of a previously recognized SCST target, the initiator
    driver would be unable to re-recognize the device as a target.
    It turns out that prior to the SCST software reloading and
    returning it's "target-capable" abilities in the PRLI payload,
    the HBA would be re-initialized as an initiator-only type port.
    Since initiators typically classify themselves as an FCP-2
    capable device, both software and firmware do not perform an
    explicit logout during port-loss. Unfortunately, as can be seen
    by the failure case, when the port (now target-capable) returns,
    firmware performs an ADISC without a follow-on PRLI, leaving
    stale 'initiator-only' data in the firmware's port database.

    Correct the discrepancy by performing the explicit logout during
    the transport's request to terminate-rport-io, thus synchronizing
    port states and ensuring a follow-on PRLI is performed.

    Reported-by: Greg Wettstein
    Signed-off-by: Andrew Vasquez
    Cc: Stable Tree
    Signed-off-by: James Bottomley

    Andrew Vasquez
     

28 Jul, 2008

1 commit

  • drivers/scsi/qla2xxx/qla_attr.c: In function 'qla24xx_vport_delete':
    drivers/scsi/qla2xxx/qla_attr.c:1184: error: implicit declaration of function 'msleep'
    make[3]: *** [drivers/scsi/qla2xxx/qla_attr.o] Error 1
    make[3]: *** Waiting for unfinished jobs....

    Reported-by: David Miller
    Signed-off-by: FUJITA Tomonori
    Acked-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     

27 Jul, 2008

23 commits


31 May, 2008

3 commits


15 May, 2008

4 commits

  • Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • This reverts commit 8084fe168a5252548cdddf2ed181c337fecd0523.
    The midlayer should be given the oppotunity to interpret the
    check-condition and based on scsi_cmnd->resid determine if a
    transfer should be retried or failed.

    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • Matthew Wilcox reported the following lockdep
    warning:

    > =================================
    > [INFO:inconsistentlockstate]
    > 2.6.26-rc1-00115-g0340eda-dirty#60
    > ---------------------------------
    > inconsistent{hardirq-on-W}->{in-hardirq-W}usage.
    > swapper/1[HC1[1]:SC0[0]:HE0:SE1]takes:
    > (&ha->hardware_lock){+-..},at:[]qla2300_intr_handler+0x35/0x1f5
    > {hardirq-on-W}statewasregisteredat:
    > []__lock_acquire+0x459/0xb1d
    > []__lock_acquire+0xad4/0xb1d
    > []lock_acquire+0x68/0x82
    > []qla2300_intr_handler+0x35/0x1f5
    > []_spin_lock+0x24/0x4d
    > []qla2300_intr_handler+0x35/0x1f5
    > []qla2300_intr_handler+0x35/0x1f5
    > []trace_hardirqs_on+0xe7/0x10e
    > []qla2x00_mailbox_command+0x1c6/0x433
    ...
    > other info that might help us debug this:
    > no locks held by swapper/1.
    >
    > stack backtrace:
    > Pid:1,comm:swapperNottainted2.6.26-rc1-00115-g0340eda-dirty#60
    > []print_usage_bug+0x100/0x10a
    > []mark_lock+0xaa/0x395
    > []__lock_acquire+0x3f2/0xb1d
    > []__lock_acquire+0xad4/0xb1d
    > []lock_acquire+0x68/0x82
    > []qla2300_intr_handler+0x35/0x1f5
    > []_spin_lock+0x24/0x4d
    > []qla2300_intr_handler+0x35/0x1f5
    > []qla2300_intr_handler+0x35/0x1f5
    > []handle_IRQ_event+0x13/0x3d
    > []handle_fasteoi_irq+0x76/0xab

    Which shows that lockdep is detecting the driver's
    interrupt-handler is run in both process and interrupt context
    with irqs-enabled in the former case.

    During init-time and error-recovery (after a RISC reset), the
    driver disables interrupts and 'polls' for completions by calling
    qla2x00_poll():

    static inline void
    qla2x00_poll(scsi_qla_host_t *ha)
    {
    ha->isp_ops->intr_handler(0, ha);
    }

    which in-turn calls the ISP registered interrupt handler. This
    patch corrects it by disabling local interrupts during polling.

    Reviewed-by: Matthew Wilcox
    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • The user-initiated dump can be a useful tool in triaging complex
    ISP and FC issues.

    Signed-off-by: Andrew Vasquez
    Signed-off-by: James Bottomley

    Andrew Vasquez