18 Sep, 2006

16 commits

  • This is a follow-up to patch "ieee1394: sbp2: enable auto spin-up for
    Maxtor disks". When I 'ejected' an OXUF922 based HDD from a Mac OS X
    box, it was spun down by the Mac and did not spin up by itself when
    attached to a Linux box right after that. The first SCSI command that
    required the bridge to access the drive ended in
    sda:sd 18:0:0:0: Device not ready: : Current: sense key: Not Ready
    Additional sense: Logical unit not ready, initializing cmd. required

    Therefore the flag which instructs scsi_mod to send START STOP UNIT with
    START=1 ("make medium ready") after such a condition is now enabled
    unconditionally for all FireWire storage devices.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • In the old days, sbp2 used to coerce all MODE SENSE commands into the
    10 bytes version. When all command set conversions were removed from
    sbp2 several months ago, sdev->use_10_for_ms = 1 was added. Meaning,
    higher SCSI layers preferred the 10 bytes version but would try the 6
    bytes version if the former failed.

    Recently, a problem with the 10 bytes version was discovered. An Initio
    INIC-1530 firmware accepted the 10 bytes version but replied with bogus
    data, showing the HDD incorrectly as write-protected. Since RBC
    actually mandates MODE SENSE (6), I checked which version was sent by
    Windows XP and Mac OS X 10.3 to an SBP-2 target hosted by Linux --- it
    was the 6 bytes version. (Exception: OS X sent the 10 bytes version to
    an MMC target. RBC and SBC got MODE SENSE (6).)

    Therefore, drop the use_10_for_ms flag from sbp2. Now the upper layers
    will try MODE SENSE (6) before MODE SENSE (10) on all SBP-2 devices.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Scsi_remove_device() may go into uninterruptible sleep if blocked.
    Therefore sbp2_remove() unblocks the Scsi_Host before the device is
    requested to be removed. But there could be another 1394 bus reset
    after that which would block the host again. The 1394 subsystem won't
    call sbp2_update() concurrently to sbp2_remove(), which is why there is
    no chance for sbp2_remove() to be unblocked by sbp2_update().

    The fix is to tell sbp2's bus reset handler when a device is to be shut
    down so that it skips scsi_block_requests() on that host. As before,
    any new commands after a reset without reconnect will be failed quickly
    by sbp2scsi_queuecommand().

    In the long term, means to go without scsi_block_requests() should be
    found.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Remove unused includes. Add missing includes, i.e. explicitly include
    all used headers. Sort includes alphabetically. Replace one call to
    signal_pending(current) to avoid to include headers just for this line.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • If the target signals a transport failure via status block, complete the
    request with DID_BUSY to indicate to the SCSI subsystem that the command
    may succeed when retried.

    Also log diagnostic information if the status block shows a transport
    related problem. It may point to hardware faults.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • While sbp2_update() is doing its duties after a bus reset, another reset
    could happen. Don't accept new requests until the next undisturbed
    sbp2_update() or until sbp2_remove().

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • The scsi_host_template's eh_abort_handler and eh_device_reset_handler
    are allowed to sleep. Use this to run sbp2_agent_reset in the more
    reliable mode which returns _after_ its write transaction was finished.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Fix for http://bugzilla.kernel.org/show_bug.cgi?id=6948

    Because sbp2 writes to the target's fetch agent's registers from within
    atomic context, it cannot sleep to guaranteedly get a free transaction
    label. This may repeatedly lead to "sbp2util_node_write_no_wait failed"
    and consequently to SCSI command abortion after timeout. A likely cause
    is that many queue_command softirqs may occur before khpsbpkt (the
    ieee1394 driver's thread which cleans up after finished transactions) is
    woken up to recycle tlabels.

    Sbp2 now schedules a workqueue job whenever sbp2_link_orb_command fails
    in sbp2util_node_write_no_wait. The job will reliably get a transaction
    label because it can sleep.

    We use the kernel-wide shared workqueue because it is unlikely that the
    job itself actually needs to sleep. In the improbable case that it has
    to sleep, it doesn't need to sleep long since the standard transaction
    timeout is 100ms.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • A deactivated macro, defined as "#define foo(bar)", will result in
    silent corruption if somebody forgets a semicolon after a call to foo.
    Replace it by "#define foo(bar) do {} while (0)" which will reveal any
    respective syntax errors.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • This patch contains the scheduled removal of the force_inquiry_hack
    module parameter.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Stefan Richter

    Adrian Bunk
     
  • The waitqueue API is used to replace a custom wait mechanism. Only one
    global waitqueue (instead of per-device waitqueues or completions) is
    added because there is usually just one waiter.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • - Add checks for the (very unlikely) cases that the target writes too
    little or too much status data or writes unsolicited status.
    - Indicate that these and similar conditions are unlikely().
    - Check the 'resp' and 'sbp_status' fields for possible failure status.
    - Slightly optimize access macros for the status block bitfields.
    - Unify a few related log messages.

    TODO: Check if 'src'==1, then withhold the respective ORB from reuse
    until status for any subsequent ORB was received. This is an old bug
    whose fix requires more complex command queue handling.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Sbp2's copy of the status fifo was cleared when management ORBs or new
    command ORBs were prepared. The latter had potential for a race
    condition if the block layer's soft IRQ and the 1394 LLD's interrupt
    handler ran on different CPUs. It would also yield wrong status if a
    command was completed with non-zero completion status before other
    commands that had zero completion status, and no new command was
    enqueued in the meantime.

    Now, the status buffer is cleared right before it is written. Thus it
    ends up in the following simpler and safer access pattern:
    - sbp2_alloc_device: allocates and implicitly clears once,
    - sbp2_handle_status_write: clears, writes, and reads,
    - sbp2_query_logins, sbp2_login_device, sbp2_reconnect_device: read.
    The latter three do not race with sbp2_handle_status_write because of
    how the protocol works.

    As a tiny optimization, the first two quadlets of the status never need
    to be cleared.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Only the driver writes ORBs, the device just reads them. Therefore
    PCI_DMA_BIDIRECTIONAL can be replaced by PCI_DMA_TODEVICE which may be
    cheaper on some architectures.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Since sbp2 is at the moment unable to do anything with the return value
    of sbp2_link_orb_command, just discard it.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • The sbp2 initiator has two ways to tell a target's fetch agent about new
    command ORBs:
    - Write the ORB's address to the ORB_POINTER register. This must not
    be done while the fetch agent is active.
    - Put the ORB's address into the previously submitted ORB's next_ORB
    field and write to the DOORBELL register. This may be done while the
    fetch agent is active or suspended. It must not be done while the
    fetch agent is in reset state.
    Sbp2 has a last_orb pointer which indicates in what way a new command
    should be announced. That pointer is concurrently accessed at various
    occasions. Furthermore, initiator and target are accessing the next_ORB
    field of ORBs concurrently and asynchronously.

    This patch does:
    - Protect all initiator accesses to last_orb by sbp2_command_orb_lock.
    - Add pci_dma_sync_single_for_device before a previously submitted
    ORB's next_ORB field is overwritten.
    - Insert a memory barrier between when next_ORB_lo and next_ORB_hi are
    overwritten. Next_ORB_hi must not be updated before next_ORB_lo.
    - Remove the rather unspecific and now superfluous qualifier "volatile"
    from the next_ORB fields.
    - Add comments on how last_orb is connected with what is known about
    the target's fetch agent's state.

    Signed-off-by: Stefan Richter

    Stefan Richter
     

04 Jul, 2006

2 commits


01 Jul, 2006

1 commit


13 Jun, 2006

8 commits

  • Replace occurrences of the magic value ~(u64)0 for invalid
    CSR address spaces by a named constant for better readability.

    Signed-off-by: Stefan Richter
    Signed-off-by: Ben Collins

    Ben Collins
     
  • The proper designator of an invalid CSR address is ~(u64)0, not (u64)0.
    Use the correct value in initialization and deregistration.
    Also, scsi_id->sbp2_lun does not need to be initialized twice.
    (scsi_id was kzalloc'd.)

    Signed-off-by: Stefan Richter
    Signed-off-by: Ben Collins

    Ben Collins
     
  • If sbp2 is forced to move data via ARM handler, the maximum packet size
    allowed for S800 transfers exceeds ohci1394's buffer size on platforms
    where PAGE_SIZE is 4096.

    Signed-off-by: Stefan Richter
    Signed-off-by: Ben Collins

    Ben Collins
     
  • Signed-off-by: Stefan Richter
    Signed-off-by: Ben Collins

    Ben Collins
     
  • Since this is useful information, promote it from a debug macro to
    a regular log message. The message appears only if the user set
    exclusive_login=0, therefore won't clutter the logs in normal use.
    Also update the comment on exclusive_login.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre
    Signed-off-by: Ben Collins

    Ben Collins
     
  • This code became ineffective a few Linux releases ago and is not
    required anyway.

    Note from Christoph Hellwig: scsi_cmnd.request_buffer is always a
    scatterlist these days. Checking random bites into it and then
    mangling the data in sbp2_check_sbp2_response will cause really bad
    memory corruption when you're not lucky enough to have the check not
    trigger by luck.

    Signed-off-by: Stefan Richter
    Signed-off-by: Ben Collins

    Ben Collins
     
  • Add support for the following types of hardware:
    + nodes that have a link speed < PHY speed
    + 1394b PHYs that are less than S800 capable
    + 1394b/1394a adapter cable between two 1394b PHYs
    Also, S1600 and S3200 are now supported if IEEE1394_SPEED_MAX is raised.

    A probing function is added to nodemgr's config ROM fetching routine
    which adjusts the allowable speed if an access problem was encountered.
    Pros and Cons of the approach:
    + minimum code footprint to support this less widely used hardware
    + nearly no overhead for unaffected hardware
    - ineffective before nodemgr began to read the ROM of affected nodes
    - ineffective if ieee1394 is loaded with disable_nodemgr=1
    The speed map CSRs which are published to the bus are not touched by the
    patch.

    Signed-off-by: Stefan Richter
    Cc: Hakan Ardo
    Cc: Calculex
    Cc: Robert J. Kosinski
    Signed-off-by: Ben Collins

    Ben Collins
     
  • The workarounds are not required for DViCO Momobay FX-3A and AFAIR not
    for Momobay CX-2. These contain an TSB42AA9A but feature the same
    firmware_revision value as the older DViCO Momobay CX-1.

    Signed-off-by: Stefan Richter
    Signed-off-by: Ben Collins

    Ben Collins
     

06 Jun, 2006

1 commit


18 May, 2006

4 commits

  • Re-enable posted writes for status FIFO.

    Besides bringing back a very minor bandwidth tweak from Linux 2.6.15.x
    and older, this also fixes an interoperability regression since 2.6.16:

    http://bugzilla.kernel.org/show_bug.cgi?id=6356
    (sbp2: scsi_add_device failed. IEEE1394 HD is not working anymore.)

    Signed-off-by: Stefan Richter
    Tested-by: Vanei Heidemann
    Tested-by: Martin Putzlocher (chip type unconfirmed)
    Signed-off-by: Linus Torvalds

    Stefan Richter
     
  • In case the blacklist with workarounds for device bugs yields a false
    positive, the module load parameter can now also be used as an override
    instead of an addition to the blacklist.

    Signed-off-by: Stefan Richter
    Signed-off-by: Linus Torvalds

    Stefan Richter
     
  • Apple decided to copy some USB stupidity over to FireWire.

    The sector number returned by iPods from read_capacity is one too many.
    This may cause I/O errors, especially if the kernel is configured for EFI
    partition support. We use the same workaround as usb-storage but have to
    check for different model IDs.

    http://marc.theaimsgroup.com/?t=114233262300001
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187409

    Acknowledgements:
    Diagnosis and therapy by Mathieu Chouquet-Stringer ,
    additional data about affected and unaffected Apple hardware from
    Vladimir Kotal, Sander De Graaf, Bryan Olmstead and Hugh Dixon.

    Signed-off-by: Stefan Richter
    Signed-off-by: Linus Torvalds

    Stefan Richter
     
  • Grand unification of the three types of workarounds we have so far.

    The "skip mode page 8" workaround is now limited to devices which
    pretend to be of TYPE_DISK instead of TYPE_RBC. This workaround is no
    longer enabled for Initio bridges.

    Patch update in anticipation of more workarounds:
    - Add module parameter "workarounds".
    - Deprecate parameter "force_inquiry_hack".
    - Compose the blacklist of a compound type for better readability and
    extensibility.
    - Remove a now unused #define.

    Signed-off-by: Stefan Richter
    Signed-off-by: Linus Torvalds

    Stefan Richter
     

03 Apr, 2006

1 commit

  • sbp2util_mark_command_completed takes a lock which was already taken by
    sbp2scsi_complete_all_commands. This is a regression in Linux 2.6.15.

    Reported by Kristian Harms at
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187394

    [ More complete commentary, as response to questions by Andrew: ]

    > This changes the call environment for all implementations of
    > ->Current_done(). Are they all safe to call under this lock?

    Short answer: Yes, trust me. ;-) Long answer:

    The done() callbacks are passed on to sbp2 from the SCSI stack along
    with each SCSI command via the queuecommand hook. The done() callback
    is safe to call in atomic context. So does
    Documentation/scsi/scsi_mid_low_api.txt say, and many if not all SCSI
    low-level handlers rely on this fact. So whatever this callback does,
    it is "self-contained" and it won't conflict with sbp2's internal ORB
    list handling. In particular, it won't race with the
    sbp2_command_orb_lock.

    Moreover, sbp2 already calls the done() handler with
    sbp2_command_orb_lock taken in sbp2scsi_complete_all_commands(). I
    admit this is ultimately no proof of correctness, especially since this
    portion of code introduced the spinlock recursion in the first place and
    we didn't realize it since this code's submission before 2.6.15 until
    now. (I have learned a lesson from this.)

    I stress-tested my patch on x86 uniprocessor with a preemptible SMP
    kernel (alas I have no SMP machine yet) and made sure that all code
    paths which involve the sbp2_command_orb_lock were gone through multiple
    times.

    Signed-off-by: Stefan Richter
    Signed-off-by: Linus Torvalds

    Stefan Richter
     

29 Mar, 2006

4 commits

  • - move call of scsi_print_command from sbp2_send_command to the beginning of
    sbp2_queue_command to show also commands which are not sent
    - put sbp2's name into scsi_print_sense
    - use __FUNCTION__ in log messages
    - remove a few less useful log messages and comments

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre

    Stefan Richter
     
  • Sbp2 relied on DID_OK to be defined as 0. Always shift DID_OK into the right
    position anyway, and explicitly return DID_OK together with CHECK_CONDITION.
    Also comment on some #if 0 code. The patch does not change current behaviour.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre

    Stefan Richter
     
  • Sbp2 did not check for successful registration of the lower address range
    when CONFIG_IEEE1394_SBP2_PHYS_DMA was set. If hpsb_register_addrspace
    failed, a "login timed-out" would occur which is misleading. Now sbp2 logs
    a sensible error message.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre

    Stefan Richter
     
  • When a new SBP-2 unit is added, sbp2 now takes a reference on the 1394
    low-level driver (ohci1394 or pcilynx). This prevents the 1394 host driver
    module from being unloaded, e.g. by an administrative routine cleanup of
    unused kernel modules or when another 1394 driver which depends on ohci1394
    is unloaded.

    The reference is dropped when the SBP-2 unit was disconnected, when sbp2 is
    unloaded or detached from the unit, or when addition of the SBP-2 unit failed.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre

    Stefan Richter
     

23 Feb, 2006

3 commits

  • Since about Linux 2.6.14, sbp2's inquiry workaround did not work anymore
    due to changes in the SCSI layer. Update it to become effective again.
    Testing one of the two known affected bridges has shown that skip_ms_page_8
    is required as well.

    Also, make force_inquiry_hack tunable via /sys/module/sbp2/parameters.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre
    (cherry picked from 99496037c6744fd938ffb8ccfc8fc91762322ff8 commit)

    Stefan Richter
     
  • Let the ieee1394 core select a suitable 1394 address range for sbp2's
    status FIFO instead of using a fixed range. Since the core only selects
    addresses which are guaranteed to be out of the "physical range" as per
    OHCI 1.1, this patch also fixes an old bug:

    OHCI controllers which implement a writeable PhysicalUpperBound register
    included sbp2's status FIFO in the physical range. That way sbp2 was
    never notified of a succesful login and always failed after timeout.
    Affected OHCI host adapters include ALi and Fujitsu controllers.

    As another side effect of this patch, the status FIFO is no longer
    located in a range for which OHCI chips perform "posted writes". Each
    status write now requires a response subaction. But since large data
    transfers involve only few status writes, there is no measurable
    decrease of I/O throughput. What's more, the status FIFO is now safe
    from potential host bus errors. Nevertheless, posted writes could be
    re-enabled by extensions to the ARM features of the 1394 stack.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre
    (cherry picked from b2d38cccad4ef80d6b672b8f89aae5fe2907b113 commit)

    Stefan Richter
     
  • If there were commands enqueued but not completed before an SBP-2 unit
    was unplugged (or an attempt to reconnect failed), knodemgrd or any
    process which tried to remove the device would sleep uninterruptibly
    in blk_execute_rq(). Therefore make sure that all commands are
    completed when sbp2 retreats.

    Signed-off-by: Stefan Richter
    Signed-off-by: Jody McIntyre
    (cherry picked from 61daa34c132c5d4ed8630e2c46e9bf2f0c7b3428 commit)

    Stefan Richter