04 Jul, 2011

2 commits


03 Jul, 2011

38 commits

  • The hard_reset parameter passed to the LLDD in the direct-attached
    phy control case allows the LLDD to filter link failure events
    while the direct-attached device reset is executing.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • The messages emitted from task.c and some from request.c likely
    duplicate (in a less undertandable way) what is reported by the
    midlayer.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Perform checking per-pci device (even though all systems will only have
    1 pci device in this generation), and delete support for silicon that
    does not report a proper revision (i.e. A0).

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Does not need its own file.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Undo some needless separation.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Most of these simple dereference macros are longer than their open coded
    equivalent. Deleting enum sci_controller_mode is thrown in for good
    measure.

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The distinction between scic_sds_ scic_ and sci_ are no longer relevant
    so just unify the prefixes on sci_. The distinction between isci_ and
    sci_ is historically significant, and useful for comparing the old
    'core' to the current Linux driver. 'sci_' represents the former core as
    well as the routines that are closer to the hardware and protocol than
    their 'isci_' brethren. sci == sas controller interface.

    Also unwind the 'sds1' out of the parameter structs.

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Remove the distinction between these two implementations and unify on
    isci_host (local instances named ihost). Hmmm, we had two
    'oem_parameters' instances, one was unused... nice.

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Remove the distinction between these two implementations and unify on
    isci_remote_device (local instances named idev).

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Remove the distinction between these two implementations and unify on
    isci_port (local instances named iport). The duplicate '->owning_port' and
    '->isci_port' in both isci_phy and isci_remote_device will be fixed in a later
    patch... this is just the straightforward rename/unification.

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Commit 0815632 "isci: unify remote_device stop_handlers" introduced the
    possibility that not all requests get terminated if we reach the
    request_count. Now that we properly reference count devices we don't
    need this self-defense and can do the straightforward scan of all active
    requests.

    Reported-by: Jeff Skirvin
    Acked-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Dan Williams
     
  • They are one in the same object so remove the distinction. The near
    duplicate fields (owning_port, and isci_port) will be cleaned up
    after the scic_sds_port isci_port unification.

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • They are one in the same object so remove the distinction. The near
    duplicate fields (owning_controller, and isci_host) will be cleaned up
    after the scic_sds_contoller isci_host unification.

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • * Rename scic_sds_stp_request to isci_stp_request
    * Remove the unused fields and union indirection

    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • the dma_pool interface is optimized for object_size << page_size which
    is not the case with isci_request objects and the dma_pool routines show
    up in the top of the profile.

    The old io_request_table which tracked whether tci slots were in-flight
    or not is replaced with an IREQ_ACTIVE flag per request.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Combine three bools into one unsigned long 'flags'. Doesn't increase the
    request size due to packing. (to do: optimize the structure layout).

    Signed-off-by: Dan Williams

    Dan Williams
     
  • The tci_pool tracks our outstanding command slots which are also the 'index'
    portion of our tags. Grabbing the tag early in ->lldd_execute_task let's us
    drop the isci_host_can_queue() and ->was_tag_assigned_by_user infrastructure.
    ->was_tag_assigned_by_user required the task context to be duplicated in
    request-local buffer. With the tci established early we can build the
    task_context directly into its final location and skip a memcpy.

    With the task context buffer at a known address at request construction we
    have the opportunity/obligation to also fix sgl handling. This rework feels
    like it belongs in another patch but the sgl handling and task_context are too
    intertwined.
    1/ fix the 'ab' pair embedded in the task context to point to the 'cd' pair in
    the task context (previously we were prematurely linking to the staging
    buffer).
    2/ fix the broken iteration of pio sgls that assumes all sgls are relative to
    the request, and does a dangerous looking reverse lookup of physical
    address to virtual address.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • When the remote device transitions to a not-ready state because of
    an NCQ error condition, all outstanding requests to that device
    are terminated and completed to libsas on the normal path. The
    device then waits for a READ LOG EXT command to issue on the task
    management path.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Updates to the frame_rcvd before need to be atomic with respect to when
    they are evaluated by libsas.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • scu_index is a parameter of isci_parse_eom_parameters and is an index
    in controller table. There is a check: scu_index > SCI_MAX_CONTROLLERS
    which is insufficient and should be: scu_index >= SCI_MAX_CONTROLLERS.
    scu_index is used as an index in the table which size is
    SCI_MAX_CONTROLLERS.

    Signed-off-by: Maciej Patelczyk
    Signed-off-by: Dan Williams

    Maciej Patelczyk
     
  • 1/ fix the timeout for wait_for_completion_timeout
    2/ In the tmf timeout case we need to wait for our termination callback
    3/ Once the request is successfully started it will be freed according to the
    normal lifetime for requests.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Instead of duplicating the smp request buffer reuse the one provided by
    libsas. This future proofs the driver to support arbitrarily large smp
    requests, and shrinks the request structure size by ~700 bytes.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • One bug and a cleanup:
    1/ Fix cases where we were unmapping invalid addresses (smp requests were
    being unmapped)

    [ 604.662770] ------------[ cut here ]------------
    [ 604.668026] WARNING: at lib/dma-debug.c:800 check_unmap+0x418/0x740()
    [ 604.675315] Hardware name: SandyBridge Platform
    [ 604.680465] isci 0000:03:00.0: DMA-API: device driver tries to free an invalid DMA memory address

    2/ The unmap routine is too large to be an inline function, and
    isci_request_io_request_get_next_sge is unused.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Due to a typo we currently copy way too much when copying over the
    response data, but since a request is likely backed by a full page
    allocation we don't corrupt live data.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Now that we have upleveled device reassignment protection to the
    isci_remote_device reference count we no longer need this level of
    self-defense.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Now that "stopping/stopped" are one in the same and signalled by a NULL device
    pointer the rest of the device status infrastructure can be removed (->status
    and ->state_lock). The "not ready for i/o state" is replaced with a state
    flag, and is evaluated under scic_lock so that we don't see transients from
    taking the device reference to submitting the i/o.

    This also fixes a potential leakage of can_queue slots in the rare case that
    SAS_TASK_ABORTED is set at submission.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • We have unsafe references to remote devices that are notified to
    disappear at lldd_dev_gone. In order to clean this up we need a single
    canonical source for device lookups and stable references once a lookup
    succeeds. Towards that end guarantee that domain_device.lldd_dev is
    NULL as soon as we start the process of stopping a device. Any code
    path that wants to safely lookup a remote device must do so through
    task->dev->lldd_dev (isci_lookup_device()).

    For in-flight references outside of scic_lock we need reference counting
    to ensure that the device is not recycled before we are done with it.
    Simplify device back references to just scic_sds_request.target_device
    which is now the only permissible internal reference that is maintained
    relative to the reference count.

    There were two occasions where we wanted new i/o's to be treated as
    SAS_TASK_UNDELIVERED but where the domain_dev->lldd_dev link is still
    intact. Introduce a 'gone' flag to prevent i/o while waiting for libsas
    to take action on the port down event.

    One 'core' leftover is that we currently call
    scic_remote_device_destruct() from isci_remote_device_deconstruct()
    which is called when the 'core' says the device is stopped. It would be
    more natural for the final put to trigger
    isci_remote_device_deconstruct() but this implementation is deferred as
    it requires other changes.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • In isci_task_request_complete() we save the response/sense data from the
    command. Make sure isci_tmf has enough space to hold the full response.

    [ it does not look like we actually use this data, and
    response_data_len/sense_data_len should be specifying the byte count,
    in any event do the simple fix first so we don't corrupt memory ]

    Reported-by: Adam Gruchala
    Tested-by: Edmund Nadolski
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Rather than return an error code and update a pointer that was passed by
    reference just return the request object directly (or null if allocation
    failed).

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Every single i/o or event completion incurs a test and branch to see if
    the cycle bit changed. For power-of-2 queue sizes the cycle bit can be
    read directly from the rollover of the queue pointer.

    Likely premature optimization, but the hidden if() and hidden
    assignments / side-effects in the macros were already asking to be
    cleaned up.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • A tag is a 16 bit number where the upper four bits is a sequence number
    and the remainder is the task context index (tci). Sanitize the macro
    names and shave 256-bytes out of scic_sds_controller by reducing the size of
    io_request_sequence.

    scic_sds_io_tag_construct --> ISCI_TAG
    scic_sds_io_tag_get_sequence --> ISCI_TAG_SEQ
    scic_sds_io_tag_get_index() --> ISCI_TAG_TCI
    scic_sds_io_sequence_increment() [delete / open code]

    Signed-off-by: Dan Williams

    Dan Williams
     
  • The circ_buf macros are ~6% faster, as measured by perf, because they take
    advantage of power-of-two math assumptions i.e. no test and branch for
    rollover. Their semantics are clearer than the hidden side effects in pool.h
    (like sci_pool_get() which hides an assignment).

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Some targets exceed the hang detect timer. Use the OS timeout to
    catch hung tasks.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • In the case where the hard reset process fails, each link in
    the port is put through a link reset sequence.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • The remote node context should only signal a device reset condition
    in a suspended state.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Walk through the list of pending requests being careful to consider that
    multiple requests can be terminated when the lock is dropped (i.e.
    invalidating the 'next' reference established by
    list_for_each_entry_safe).

    Also noticed that all callers to isci_terminate_pending_requests()
    specifying terminating, so just drop the parameter.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • In the situation where a termination of an I/O times-out,
    make sure that the linkage from the request to the task
    is severed completely. Also make sure that the selection
    of tasks to terminate occurs under scic_lock.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin
     
  • Requests that fail at start because of a reset pending condition
    must be set to complete in order to allow for later cleanup.

    Signed-off-by: Jeff Skirvin
    Signed-off-by: Dan Williams

    Jeff Skirvin