03 Jul, 2011

40 commits

  • 1/ add OEM paramater support for mode_type (MPC vs APC)
    2/ add OEM parameter support for max_number_concurrent_device_spin_up
    3/ cleanup scic_sds_controller_start_next_phy

    todo: hook up the amp control afe parameters into the afe init code

    Signed-off-by: Henryk Dembkowski
    Signed-off-by: Jacek Danecki
    [cleaned up scic_sds_controller_start_next_phy]
    Signed-off-by: Dan Williams

    Henryk Dembkowski
     
  • Adding EFI variable retrieving for OEM parameters. Still need GUID and
    variable name.

    Also updated the data struct for oem parameters and hex file for firmware

    Signed-off-by: Dave Jiang
    [fix CONFIG_EFI=n compile error]
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • We need to scan the OROM for signature and grab the OEM parameters. We
    also need to do the same for EFI. If all fails then we resort to user
    binary blob, and if that fails then we go to the defaults.

    Share the format with the create_fw utility so that all possible sources
    of the parameters are in-sync.

    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Don't assume the hardware is in a known state at init.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • A usage of "FALSE" leaked in as well as some checkpatch escapes.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • In the case where submitted I/Os fail with the status code
    SCI_FAILURE_REMOTE_DEVICE_RESET_REQUIRED, the execute function now waits
    until scic_lock is cleared before calling the helper function
    "isci_request_signal_device_reset" which sets the flag for the pending
    reset condition on the I/O.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • A domain_device has the same lifetime as its related scsi_target. The
    scsi_target is reference counted based on outstanding commands,
    therefore it is safe to assume that if we have a valid sas_task that the
    ->dev pointer is also valid.

    The asd_sas_port of a domain_device has the same lifetime as the driver
    so it can also never be NULL as long as the sas_task is valid and the
    driver is loaded.

    This also cleans up isci_task_complete_for_upper_layer(), renames it to
    isci_task_refuse() and notices that the isci_completion_selection
    parameter was set to isci_perform_normal_io_completion by all callers.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Allow each controller to be identified via sysfs.

    # cat /sys/class/scsi_host/host13/isci_id
    1

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Make sure all pending I/O including any in the libsas error handler
    process is cleaned-up.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • In the case of I/O requests being failed because of a required device
    reset condition, set the response and status to indicate an I/O failure.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Since libsas takes the domain device sata_dev.ap->lock before submitting
    a task, error completions in the submit path for SATA devices must
    unlock/relock when completing the sas_task back to libsas.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The request may be in the "aborted" or the "completed" state when
    performing a task management operation on it.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • In the case where a SAS or SATA LUN reset TMF is built a NULL pointer
    dereference occurred because of the (unused) callback data pointer.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams
    Signed-off-by: Jacek Danecki

    Jeff Skirvin
     
  • Added a request "dead" state for use when a termination wait times-out.

    isci_terminate_pending_requests now detaches the device's pending list
    and terminates each entry on the detached list.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Since the request structure contains a pointer to the completion to be
    used if the request is being aborted or terminated, there is no reason
    to pass the completion as a pointer to isci_terminate_request_core().

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Jacek Danecki
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Made sure the device ready check accounts for all states.
    Moved the aborted task check into the loop of pulling task requests
    off of the submitted list.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Jacek Danecki
    [remove host and device starting state checks]
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • The pointer to the core representation of a request is marked NULL at
    completion, but we need to save the i/o tag for task management.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Jacek Danecki
    [revise changelog]
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • If there is a pending device reset, the I/O is used to accomplish the reset by setting the
    RESET bit in the task status, and then putting the task into the error handler
    path using sas abort task.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Jacek Danecki
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Corrected use of the request state_lock in the completion callback.

    In the case where an abort (or reset) thread is trying to terminate an
    I/O request, it sets the request state to "aborting" (or "terminating")
    if the state is still "starting". One of the bugs was to never set the
    state to "completed". Another was to not correctly recognize the
    situation where the I/O had completed but the sas_task was still pending
    callback to task_done - this was typically a problem in the LUN and
    device reset cases.

    It is now possible that we leave isci_task_abort_task() with
    request->io_request_completion pointing to localy allocated
    aborted_io_completion struct. It may result in a system crash.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Maciej Trela
    Signed-off-by: Jacek Danecki
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Changes to move management of the reqs_in_process entry for the request here.
    Made changes to note when the task is already in the abort path and
    cannot be completed through callbacks.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Jacek Danecki
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • In the condition where outstanding I/Os are being cleaned from the device
    requests in process list, the cleanup function needs to check that the
    request is actually a sas-task and not a task management function.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Reported-by: James Bottomley
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The remote_device_lock is currently used to protect a controller global
    resource (RNCs), but the remote_device_lock is per-port.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Until we synchronize against device removal this limits the damage of
    use after free bugs to the driver's own objects. Unless we implement
    reference counting we need to ensure at least a subset of a remote
    device is valid at all times. We follow the lead of other libsas
    drivers that also preallocate devices.

    This also enforces maximum remote device accounting at the lldd layer,
    but the core may still run out of RNC's before we hit this limit.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Replace the device completion infrastructure with the controller wide
    event queue. There was a potential for the stop and ready notifications
    to corrupt each other, now that cannot happen.

    The stop pending flag cannot be used until devices are statically
    allocated. We temporarily need to maintain a completion to handle
    waiting for an object that has disappeared, but we can at least stop
    scribbling on freed memory.

    A future change will also get rid of the "stopping" state as it should
    not be exposed to the rest of the driver.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • The midlayer is already throttling i/o in the places where host_quiesce
    was trying to prevent further i/o to the device. It's also problematic
    in that it holds a lock over GFP_KERNEL allocations.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • It belies the fact that isci_remote_device and scic_sds_remote_device
    are one in same object with the same lifetime rules.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • isci_host_by_id() should have been a clue that an array would have been
    a simpler approach.

    Reported-by: James Bottomley
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Now that phys_to_virt() and virt_to_phys() have been removed we are no
    longer violating the dma mapping (or kmap apis).

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Ross says:
    "The memory allocation for these requests doesn’t take into account the
    additional memory needed when the code in
    scic_sds_s[mst]p_request_assign_buffers() shifts the struct
    scu_task_context so that it is cache line aligned:

    In an example from my machine, total buffer that I’ve given to SCIC goes
    from 0x410024566f84 to 0x410024567308. From this same example, this
    call shifts my task_context_buffer from 0x410024567208 to
    0x410024567240.

    This means that the task_context_buffer that used to range from
    0x410024567208 to 0x410024567308 instead now goes from 0x410024567240 to
    0x410024567340.

    When the memset() call at the end of scic_task_request_construct()
    clears out this task_context_buffer, it does so from 0x410024567240 to
    0x410024567340, effectively killing whatever buffer follows this
    allocation in memory."

    djbw:
    Use the kernel's PTR_ALIGN instead of
    scic_sds_request_align_task_context_buffer() and SMP_CACHE_BYTES instead of
    the local CACHE_LINE_SIZE definition.

    TODO: These allocations really want to be better defined in a union rather
    than opaque buffers carved up by macros.

    Reported-by: Ross Zwisler
    Signed-off-by: Jacek Danecki
    Signed-off-by: Dan Williams

    Dan Williams
     
  • When aborting a task context we need to be sure that the hardware has acted on
    this request (retrieved the task context) before invalidating the remote node
    context. In the case of the "dummy" task context and remote node we do not
    have the full state machine that goes through the complete tc abort and rnc
    invalidate states. Instead we ensure the hardware has seen and acted on

    Signed-off-by: Jacek Danecki
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Moving some of the chattiness of warning messages to debug so only the Linux
    system messages are shown.

    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • Adding support for PHY_FUNC_LINK_RESET and PHY_FUNC_DISABLE. This allow the
    sysfs knob enable (both 0 and 1) and link_reset to work properly.

    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • Core reworks to support stopping and re-starting the controller, lays the
    groundwork for phy disable / re-enable and fixes other bugs around port/phy
    setup/teardown.

    Signed-off-by: Pawel Marek
    Signed-off-by: Dan Williams

    Pawel Marek
     
  • Observed that some devices return a d2h fis, treat like an sdb error fis.

    Signed-off-by: Piotr Sawicki
    Signed-off-by: Dan Williams

    Piotr Sawicki
     
  • There is a condition whereby TCs (task contexts) can jump to the head of
    the round robin queue causing indefinite starvation of pending tasks.
    Posting a TC to a suspended RNC (remote node context) causes the
    hardware to select that task first, but since the RNC is suspended the
    scheduler proceeds to the next task in the expected round robin fashion,
    restoring TC arbitration fairness.

    Signed-off-by: Tomasz Chudy
    Signed-off-by: Dan Williams

    Tomasz Chudy
     
  • Prepare the timer api for the arrival of dynamic creation and
    destruction events from the core. It pretended to do this previously
    but the core to date only used it in a static init-time only fashion.
    This is an interim fix until a cleaner event queue can be developed.

    1/ make all locking external to the api (add WARN_ONCE to verify)
    2/ add a timer_destroy interface (to be used by the core)
    3/ use del_timer_sync() prior to deallocating timer data
    4/ delete the "timer_list" indirection, we only have timers allocated
    for the isci_host
    5/ fix detection of timer list allocation errors

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Undo the open coded and incorrect translation of the oem parameter sas
    address to its libsas expected format.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Removed all callbacks in the deprecated.c. Core will call the appropriate
    functions directly.

    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • Renaming the callbacks to apparopriate event notify calls for the LLDD.

    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang