20 Jul, 2012

40 commits

  • When recovering failed eh-cmnds let the lldd attempt an abort via
    scsi_abort_eh_cmnd before escalating.

    Reviewed-by: Jacek Danecki
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • The strategy handlers may be called in places that are problematic for
    libsas (i.e. sata resets outside of domain revalidation filtering /
    libata link recovery), or problematic for userspace (non-blocking ioctl
    to sleeping reset functions). However, these routines are also called
    for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
    as long as we are running in the host's error handler, otherwise arrange
    for them to be triggered in eh_context.

    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • A quick reading of scsi_error_handler() one could come away with the
    impression that it does its wakeup event check while the task state is
    TASK_RUNNING. In fact it sets TASK_INTERRUPTIBLE at the bottom of the
    loop, but that is ~50 lines down.

    Just set TASK_INTERRUPTIBLE at the top of loop and be done.

    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • eh is woken up automatically by the presence of failed commands,
    scsi_schedule_eh is reserved for cases where there are no failed
    commands. This guarantees that host_eh_sceduled is only incremented
    when an explicit eh request is made.

    Reviewed-by: Jacek Danecki
    Signed-off-by: Maciej Trela
    [fixed spurious delete of sas_ata_task_abort]
    Signed-off-by: Artur Wojcik
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Maciej Trela
     
  • Rapid ata hotplug on a libsas controller results in cases where libsas
    is waiting indefinitely on eh to perform an ata probe.

    A race exists between scsi_schedule_eh() and scsi_restart_operations()
    in the case when scsi_restart_operations() issues i/o to other devices
    in the sas domain. When this happens the host state transitions from
    SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
    ->host_busy is non-zero so we put the eh thread to sleep even though
    ->host_eh_scheduled is active.

    Before putting the error handler to sleep we need to check if the
    host_state needs to return to SHOST_RECOVERY for another trip through
    eh. Since i/o that is released by scsi_restart_operations has been
    blocked for at least one eh cycle, this implementation allows those
    i/o's to run before another eh cycle starts to discourage hung task
    timeouts.

    Cc:
    Reported-by: Tom Jackson
    Tested-by: Tom Jackson
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • When managing shost->host_eh_scheduled libata assumes that there is a
    1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship
    so it needs to manage host_eh_scheduled cumulatively at the host level.
    The sched_eh and end_eh port port ops allow libsas to track when domain
    devices enter/leave the "eh-pending" state under ha->lock (previously
    named ha->state_lock, but it is no longer just a lock for ha->state
    changes).

    Since host_eh_scheduled indicates eh without backing commands pinning
    the device it can be deallocated at any time. Move the taking of the
    domain_device reference under the port_lock to guarantee that the
    ata_port stays around for the duration of eh.

    Reviewed-by: Jacek Danecki
    Acked-by: Jeff Garzik
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • The following crash results from cases where the end_device has been
    removed before scsi_sysfs_add_sdev has had a chance to run.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
    IP: [] sysfs_create_dir+0x32/0xb6
    ...
    Call Trace:
    [] kobject_add_internal+0x120/0x1e3
    [] ? trace_hardirqs_on+0xd/0xf
    [] kobject_add_varg+0x41/0x50
    [] kobject_add+0x64/0x66
    [] device_add+0x12d/0x63a
    [] ? _raw_spin_unlock_irqrestore+0x47/0x56
    [] ? module_refcount+0x89/0xa0
    [] scsi_sysfs_add_sdev+0x4e/0x28a
    [] do_scan_async+0x9c/0x145

    ...teach scsi_sysfs_add_devices() to check for deleted devices() before
    trying to add them, and teach scsi_remove_target() how to remove targets
    that have not been added via device_add().

    Cc:
    Reported-by: Dariusz Majchrzak
    Signed-off-by: Dan Williams
    Signed-off-by: James Bottomley

    Dan Williams
     
  • This may not fix all endian issues in this driver, but it does get the
    driver working on PowerPC for a PMC SRC card. So it should at least fix
    all the problems in the core and in the SRC support.

    [jejb: fix >> 32 breakage reported by Fengguang Wu]
    Signed-off-by: Ben Collins
    Acked-by: Achim Leubner
    Signed-off-by: James Bottomley

    Ben Collins
     
  • The loop that waited for syncronous fib commands was causing a CPU stall
    when a timeout actually occured.

    1) Switch to using a more accurate timeout mechanism.
    2) Do not pace the loop with udelay(). Use cpu_relax() to allow for
    scheduling to occur.

    Signed-off-by: Ben Collins
    Acked-by: Achim Leubner
    Signed-off-by: James Bottomley

    Ben Collins
     
  • When an error occured that would shut down the driver, some in-flight
    events were getting caught up, deadlocking a CPU or two.

    Signed-off-by: Ben Collins
    Acked-by: Achim Leubner
    Signed-off-by: James Bottomley

    Ben Collins
     
  • This also stops using the "legacy crap" in Scsi_Host (shost->base is an
    unsigned long).

    This affected 32-bit systems that have 64-bit resource sizes, causing the
    IO address to be truncated.

    Signed-off-by: Ben Collins
    Acked-by: Achim Leubner
    Signed-off-by: James Bottomley

    Ben Collins
     
  • Introduce scsi_dh_attached_handler_name() to retrieve the name of the
    scsi_dh that is attached to the scsi_device associated with the provided
    request queue. Returns NULL if a scsi_dh is not attached.

    Also, fix scsi_dh_{attach,detach} function header comments to document
    @q rather than @sdev.

    Signed-off-by: Mike Snitzer
    Tested-by: Babu Moger
    Reviewed-by: Chandra Seetharaman
    Acked-by: Hannes Reinecke
    Signed-off-by: James Bottomley

    Mike Snitzer
     
  • Fixed the parentheses so the tcp push bit would be sent properly.

    Signed-off-by: Karen Xie
    Reviewed-by: Mike Christie
    Signed-off-by: James Bottomley

    Karen Xie
     
  • Avoid that the code for requeueing SCSI requests triggers a
    crash by making sure that that code isn't scheduled anymore
    after a device has been removed.

    Also, source code inspection of __scsi_remove_device() revealed
    a race condition in this function: no new SCSI requests must be
    accepted for a SCSI device after device removal started.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Mike Christie
    Acked-by: Tejun Heo
    Signed-off-by: James Bottomley

    Bart Van Assche
     
  • The return value of scsi_queue_insert() is ignored by all its
    callers, hence change the return type of this function into
    void.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Mike Christie
    Reviewed-by: Tejun Heo
    Signed-off-by: James Bottomley

    Bart Van Assche
     
  • When we call scsi_unprep_request() the command associated with the request
    gets destroyed and therefore drops its reference on the device. If this was
    the only reference, the device may get released and we end up with a NULL
    pointer deref when we call blk_requeue_request.

    Reported-by: Mike Christie
    Signed-off-by: Bart Van Assche
    Reviewed-by: Mike Christie
    Reviewed-by: Tejun Heo
    Cc:
    [jejb: enhance commend and add commit log for stable]
    Signed-off-by: James Bottomley

    Bart Van Assche
     
  • Use blk_queue_dead() to test whether the queue is dead instead
    of !sdev. Since scsi_prep_fn() may be invoked concurrently with
    __scsi_remove_device(), keep the queuedata (sdev) pointer in
    __scsi_remove_device(). This patch fixes a kernel oops that
    can be triggered by USB device removal. See also
    http://www.spinics.net/lists/linux-scsi/msg56254.html.

    Other changes included in this patch:
    - Swap the blk_cleanup_queue() and kfree() calls in
    scsi_host_dev_release() to make that code easier to grasp.
    - Remove the queue dead check from scsi_run_queue() since the
    queue state can change anyway at any point in that function
    where the queue lock is not held.
    - Remove the queue dead check from the start of scsi_request_fn()
    since it is redundant with the scsi_device_online() check.

    Reported-by: Jun'ichi Nomura
    Signed-off-by: Bart Van Assche
    Reviewed-by: Mike Christie
    Reviewed-by: Tejun Heo
    Cc:
    Signed-off-by: James Bottomley

    Bart Van Assche
     
  • If the queue is dead blk_execute_rq_nowait() doesn't invoke the done()
    callback function. That will result in blk_execute_rq() being stuck
    in wait_for_completion(). Avoid this by initializing rq->end_io to the
    done() callback before we check the queue state. Also, make sure the
    queue lock is held around the invocation of the done() callback. Found
    this through source code review.

    Signed-off-by: Muthukumar Ratty
    Signed-off-by: Bart Van Assche
    Reviewed-by: Tejun Heo
    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    Muthukumar Ratty
     
  • We took this lock with spin_lock() so we should unlock it with
    spin_unlock() instead of spin_unlock_irq(). This was introduced in
    f2c8dc402b "[SCSI] megaraid_mbox: remove scsi_assign_lock usage".

    Signed-off-by: Dan Carpenter
    Acked-by: Adam Radford
    Signed-off-by: James Bottomley

    Dan Carpenter
     
  • On 64 bit systems the current code sets 32 bits of "seg" and leaves the
    other 32 uninitialized. It doesn't matter since the variable is never
    used. But it's still messy and we should fix it.

    Signed-off-by: Dan Carpenter
    Acked-by: Adam Radford
    Signed-off-by: James Bottomley

    Dan Carpenter
     
  • If bfad_thread_workq(bfad) was not BFA_STATUS_OK then we freed "im"
    and then dereferenced it.

    I did a little clean up because it seemed nicer to return directly
    instead of doing a superfluous goto. I looked at other functions in
    this file and it seems like returning directly is standard.

    Signed-off-by: Dan Carpenter
    Acked-by: Krishna Gudipati
    Signed-off-by: James Bottomley

    Dan Carpenter
     
  • If mc == BFI_MC_MAX then we're reading past the end of the
    mod->mbhdlr[] array.

    Signed-off-by: Dan Carpenter
    Acked-by: Krishna Gudipati
    Signed-off-by: James Bottomley

    Dan Carpenter
     
  • Initialize atomic_t scsi_host_next_hn and ioerr_cntas per the guidelines
    defined in Documentation/atomic_ops.txt

    Signed-off-by: Josh Hunt
    Signed-off-by: James Bottomley

    Josh Hunt
     
  • A quote from SPC-4: "While in the unavailable primary target port
    asymmetric access state, the device server shall support those of
    the following commands that it supports while in the active/optimized
    state: [ ... ] d) SET TARGET PORT GROUPS; [ ... ]". Hence re-enable
    sending STPG to a target port group that is in the unavailable state.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Babu Moger
    Signed-off-by: James Bottomley

    Bart Van Assche
     
  • Signed-off-by: Vikas Chaudhary
    Signed-off-by: James Bottomley

    Vikas Chaudhary
     
  • Signed-off-by: Vikas Chaudhary
    Signed-off-by: James Bottomley

    Vikas Chaudhary
     
  • Fix following message:-
    drivers/scsi/qla4xxx/ql4_os.c:3266:5: error: symbol 'qla4xxx_post_aen_work' redeclared with different type (originally declared at drivers/scsi/qla4xxx/ql4_glbl.h:186) - incompatible argument 2 (different signedness)

    Signed-off-by: Vikas Chaudhary
    Signed-off-by: James Bottomley

    Vikas Chaudhary
     
  • Allow multi-session to target (for flash ddbs) accesible via
    multiple network portal

    Signed-off-by: Vikas Chaudhary
    Signed-off-by: James Bottomley

    Vikas Chaudhary
     
  • Currently the backoff algorithm for when to retry alua rtpg
    requests progresses geometrically as so:

    2, 4, 8, 16, 32, 64... seconds.

    This progression can lead to un-needed delay in retrying
    alua rtpg requests when the rtpgs are delayed. A less
    aggressive backoff algorithm that is additive would not
    lead to such large jumps when delays start getting long, but
    would backoff linearly:

    2, 4, 6, 8, 10... seconds.

    Signed-off-by: Martin George
    Signed-off-by: Rob Evers
    Reviewed-by: Babu Moger
    Signed-off-by: James Bottomley

    Rob Evers
     
  • Some storage arrays are known to return 'illegal request'
    when an rtpg extended header request is made. T10 says the
    array should ignore the bit, and return the non-extended
    rtpg as the array doesn't support the request. Working
    around this by retrying the rtpg request without the extended
    header bit set when the extended rtpg request results in
    illegal request.

    Signed-off-by: Rob Evers
    Reviewed-by: Babu Moger
    Signed-off-by: James Bottomley

    Rob Evers
     
  • During alua transitions, an array can return transitioning
    status in response to rtpg requests. These requests get
    retried for a maximum of 60 seconds by default before timing
    out. Sometimes this timeout isn't sufficient to allow the
    array to complete the transition. T10-spc4 addresses this
    under 'Report Target Port Groups' command.

    This update retrieves the timeout value from the storage
    array if available and retries the transitioning rtpgs
    for up to the 'implied transitioning timeout' value

    Signed-off-by: Rob Evers
    Reviewed-by: Babu Moger
    Signed-off-by: James Bottomley

    Rob Evers
     
  • ARCMSR_ARC1880_DiagWrite_ENABLE is 0x00000080 so (x | 0x00000080) is
    never zero. The intent here was to test that loop until
    ARCMSR_ARC1880_DiagWrite_ENABLE was turned on, but because the test was
    wrong, we would do five loops regardless of whether it succeed or not.

    Also I simplified the condition a little by removing the unused
    assignement.

    Signed-off-by: Dan Carpenter
    Acked-by: Nick Cheng
    Signed-off-by: James Bottomley

    Dan Carpenter
     
  • As the limitation of RR312x's dma engine, the HBA can not access host memory
    over 12GB. This fixes

    https://bugzilla.kernel.org/show_bug.cgi?id=14311

    [alan: resurrected bug from 2009 and pushed upstream]
    Reported-by: Alan Cox
    Signed-off-by: HighPoint Linux Team
    Signed-off-by: James Bottomley

    HighPoint Linux Team
     
  • Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     
  • Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     
  • Fix System Panic During IO Test using Medusa tool

    Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     
  • Fix fcp_imax module parameter to dynamically change FCP EQ delay multiplier

    Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     
  • Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     
  • Fixed system held-up when performing resource provsion through same PCI
    function

    Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart
     
  • Fix system hang due to bad protection module parameters (CR: 130769)

    Signed-off-by: Alex Iannicelli
    Signed-off-by: James Smart
    Signed-off-by: James Bottomley

    James Smart