Doug / smarc-fsl-linux-kernel | Embedian Git Server

09 Oct, 2012

1 commit

329a402cb [SCSI] Shorten the path length of scsi_cmd_to_driver() ... Browse Code »

This patch tries to shorten the path length of scsi_cmd_to_driver(). As only
REQ_TYPE_BLOCK_PC commands can be submitted without a driver, so we could
avoid the related NULL checking, as long as we make sure we don't use it for
REQ_TYPE_BLOCK_PC type commands. Plus, this fixes a bug where you get
different behaviors from REQ_TYPE_BLOCK_PC commands when a driver is and isn't
attached.

Signed-off-by: Li Zhong
Reviewed-by: Martin K. Petersen
Signed-off-by: James Bottomley

Li Zhong
2012-10-09 19:04:42 +0800

22 Aug, 2012

1 commit

14216561e [SCSI] Fix 'Device not ready' issue on mpt2sas ... Browse Code »

This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.

SAT-2 says (section 8.12.2)

if the device is in the stopped state as the result of
processing a START STOP UNIT command (see 9.11), then the SATL
shall terminate the TEST UNIT READY command with CHECK CONDITION
status with the sense key set to NOT READY and the additional
sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
REQUIRED;

mpt2sas internal SATL seems to implement this. The result is very confusing
standby behaviour (using hdparm -y). If you suspend a drive and then send
another command, usually it wakes up. However, if the next command is a TEST
UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
the SATL rules for this, returning NOT READY to all subsequent commands. This
means that the ordering of TEST UNIT READY is crucial: if you send TUR and
then a command, you get a NOT READY to both back. If you send a command and
then a TUR, you get GOOD status because the preceeding command woke the drive.

This bit us badly because

commit 85ef06d1d252f6a2e73b678591ab71caad4667bb
Author: Tejun Heo
Date: Fri Jul 1 16:17:47 2011 +0200

block: flush MEDIA_CHANGE from drivers on close(2)

Changed our ordering on TEST UNIT READY commands meaning that SATA drives
connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
SATL sees the suspend *before* the drives get awoken by the next ATA command)
resulting in lots of failed commands.

The standard is completely nuts forcing this inconsistent behaviour, but we
have to work around it.

The fix for this is twofold:

1. Set the allow_restart flag so we wake the drive when we see it has been
suspended

2. Return all TEST UNIT READY status directly to the mid layer without any
further error handling which prevents us causing error handling which
may offline the device just because of a media check TUR.

Reported-by: Matthias Prager
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley

James Bottomley
2012-08-22 13:42:54 +0800

20 Jul, 2012

2 commits

b9d5c6b7e [SCSI] cleanup setting task state in scsi_error_handler() ... Browse Code »

A quick reading of scsi_error_handler() one could come away with the
impression that it does its wakeup event check while the task state is
TASK_RUNNING. In fact it sets TASK_INTERRUPTIBLE at the bottom of the
loop, but that is ~50 lines down.

Just set TASK_INTERRUPTIBLE at the top of loop and be done.

Signed-off-by: Dan Williams
Signed-off-by: James Bottomley

Dan Williams
2012-07-20 15:58:47 +0800
57fc2e335 [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) ... Browse Code »

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh. Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.

Cc:
Reported-by: Tom Jackson
Tested-by: Tom Jackson
Signed-off-by: Dan Williams
Signed-off-by: James Bottomley

Dan Williams
2012-07-20 15:58:46 +0800

23 May, 2012

1 commit

e8650a082 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial updates from Jiri Kosina:
"As usual, it's mostly typo fixes, redundant code elimination and some
documentation updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
edac, mips: don't change code that has been removed in edac/mips tree
xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
lib: Change mail address of Oskar Schirmer
net: Change mail address of Oskar Schirmer
arm/m68k: Change mail address of Sebastian Hess
i2c: Change mail address of Oskar Schirmer
net: Fix tcp_build_and_update_options comment in struct tcp_sock
atomic64_32.h: fix parameter naming mismatch
Kconfig: replace "--- help ---" with "---help---"
c2port: fix bogus Kconfig "default no"
edac: Fix spelling errors.
qla1280: Remove redundant NULL check before release_firmware() call
remoteproc: remove redundant NULL check before release_firmware()
qla2xxx: Remove redundant NULL check before release_firmware() call.
aic94xx: Get rid of redundant NULL check before release_firmware() call
tehuti: delete redundant NULL check before release_firmware()
qlogic: get rid of a redundant test for NULL before call to release_firmware()
bna: remove redundant NULL test before release_firmware()
tg3: remove redundant NULL test before release_firmware() call
typhoon: get rid of redundant conditional before all to release_firmware()
...

Linus Torvalds
2012-05-23 10:22:50 +0800

16 Apr, 2012

2 commits

3b729f764 scsi: fix comment spelling fix recory->recovery ... Browse Code »

Signed-off-by: Santosh Y
Signed-off-by: Jiri Kosina

Santosh Y
2012-04-16 20:35:30 +0800
919f797a4 SCSI: Fix error handling when no ULD is attached ... Browse Code »

Commit 18a4d0a22ed6 ("[SCSI] Handle disk devices which can not process
medium access commands") introduced a bug in which we would attempt to
dereference the scsi driver even when the device had no ULD attached.

Ensure that a driver is registered and make the driver accessor function
more resilient to errors during device discovery.

Reported-by: Elric Fu
Reported-by: Bart Van Assche
Signed-off-by: Martin K. Petersen
Signed-off-by: Linus Torvalds

Martin K. Petersen
2012-04-16 02:08:53 +0800

20 Feb, 2012

1 commit

18a4d0a22 [SCSI] Handle disk devices which can not process medium access commands ... Browse Code »

We have experienced several devices which fail in a fashion we do not
currently handle gracefully in SCSI. After a failure these devices will
respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
but any command accessing the storage medium will time out.

The following patch adds an callback that can be used by upper level
drivers to inspect the results of an error handling command. This in
turn has been used to implement additional checking in the SCSI disk
driver.

If a medium access command fails twice but TEST UNIT READY succeeds both
times in the subsequent error handling we will offline the device. The
maximum number of failed commands required to take a device offline can
be tweaked in sysfs.

Also add a new error flag to scsi_debug which allows this scenario to be
easily reproduced.

[jejb: fix up integer parsing to use kstrtouint]
Signed-off-by: Martin K. Petersen
Signed-off-by: James Bottomley

Martin K. Petersen
2012-02-20 00:14:52 +0800

19 Feb, 2012

2 commits

47ac56db1 [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR ... Browse Code »

Permanent target failures are non-retryable and should be classified as
TARGET_ERROR; otherwise dm-multipath will retry an IO request that will
always fail at the target.

A SCSI command that fails with ILLEGAL_REQUEST sense and Additional
sense 0x20, 0x21, 0x24 or 0x26 represents a permanent TARGET_ERROR.

Signed-off-by: Mike Snitzer
Signed-off-by: James Bottomley

Mike Snitzer
2012-02-19 23:39:59 +0800
2082ebc45 [SCSI] fix the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE) ... Browse Code »

This patch fixes the host byte settings DID_TARGET_FAILURE and
DID_NEXUS_FAILURE. The function __scsi_error_from_host_byte, tries to reset
the host byte to DID_OK. But that does not happen because of the OR operation.

Here is the flow.

scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte

Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
result will be set as DID_NEXUS_FAILURE (=0x11). Then in
__scsi_error_from_host_byte, when we do OR with DID_OK. Purpose is to reset
it back to DID_OK. But that does not happen. This patch fixes this issue.

Signed-off-by: Babu Moger
Signed-off-by: James Bottomley

Moger, Babu
2012-02-19 22:08:59 +0800

09 Jan, 2012

1 commit

ae0751ffc [SCSI] add flag to skip the runtime PM calls on the host ... Browse Code »

With previous change, now the ata port runtime suspend will happen as:

disk suspend --> scsi target suspend --> scsi host suspend --> ata port
suspend

ata port(parent device) suspend need to schedule scsi EH which will resume
scsi host(child device). Then the child device resume will in turn make
parent device resume first. This is kind of recursive.

This patch adds a new flag Scsi_Host::eh_noresume.
ata port will set this flag to skip the runtime PM calls on scsi host.

Acked-by: Alan Stern
Signed-off-by: Lin Ming
Signed-off-by: Jeff Garzik

Lin Ming
2012-01-09 08:14:57 +0800

27 Aug, 2011

1 commit

dfcf77758 [SCSI] Fix out of spec CD-ROM problem with media change ... Browse Code »

Some CD-ROMs fail to report a media change correctly. The specific
one for this patch simply fails to respond to commands, then gives a
UNIT ATTENTION after being reset which returns ASC/ASCQ 28/00. This
is out of spec behaviour, but add a check in the eat CC/UA on reset
path to catch this case so the CD-ROM will function somewhat properly.

[jejb: fixed up white space and accepted without signoff]
Signed-off-by: James Bottomley

TARUISI Hiroaki
2011-08-27 22:36:41 +0800

25 May, 2011

1 commit

3eef6257d [SCSI] Reduce error recovery time by reducing use of TURs ... Browse Code »

In error recovery, most scsi error recovery stages will send a TUR command
for every bad command when a driver's error handler reports success. When
several bad commands to the same device, this results in a device
being probed multiple times.

This becomes very problematic if the device or connection is in a state
where the device still doesn't respond to commands even after a recovery
function returns success. The error handler must wait for the test
commands to time out. The time waiting for the redundant commands can
drastically lengthen error recovery.

This patch alters the scsi mid-layer's error routines to send test commands
once per device instead of once per bad command. This can drastically
lower error recovery time.

[jejb: fixed up whitespace and formatting]
Signed-of-by: David Jeffery
Signed-off-by: James Bottomley

David Jeffery
2011-05-25 00:51:53 +0800

16 Apr, 2011

1 commit

deb1cb63d [SCSI] Log thin provisioning threshold event ... Browse Code »

At least log the message that we received a THIN PROVISIONING SOFT
THRESHOLD REACHED Unit Attention. Also added it to unit attention
decodes.

Signed-off-by: Shyam Iyer
Signed-off-by: James Bottomley

Shyam Iyer
2011-04-16 05:29:25 +0800

22 Mar, 2011

1 commit

0bf8c8697 Reduce sequential pointer derefs in scsi_error.c and reduce size as well ... Browse Code »

This patch reduces the number of sequential pointer derefs in
drivers/scsi/scsi_error.c

This has been submitted a number of times over a couple of years. I
believe this version adresses all comments it has gathered over time.
Please apply or reject with a reason.

The benefits are:

- makes the code easier to read. Lots of sequential derefs of the same
pointers is not easy on the eye.

- theoretically at least, just dereferencing the pointers once can
allow the compiler to generally slightly faster code, so in theory
this could also be a micro speed optimization.

- reduces size of object file (tiny effect: on x86-64, in at least one
configuration, the text size decreased from 9439 bytes to 9400)

- removes some pointless (mostly trailing) whitespace.

Signed-off-by: Jesper Juhl
Signed-off-by: Linus Torvalds

Jesper Juhl
2011-03-22 06:54:35 +0800

13 Feb, 2011

1 commit

63583cca7 [SCSI] Add detailed SCSI I/O errors ... Browse Code »

Instead of just passing 'EIO' for any I/O error we should be
notifying the upper layers with more details about the cause
of this error.

Update the possible I/O errors to:

- ENOLINK: Link failure between host and target
- EIO: Retryable I/O error
- EREMOTEIO: Non-retryable I/O error
- EBADE: I/O error restricted to the I_T_L nexus

'Retryable' in this context means that an I/O error _might_ be
restricted to the I_T_L nexus (vulgo: path), so retrying on another
nexus / path might succeed.

'Non-retryable' in general refers to a target failure, so this
error will always be generated regardless of the I_T_L nexus
it was send on.

I/O errors restricted to the I_T_L nexus might be retried
on another nexus / path, but they should _not_ be queued
if no paths are available.

Signed-off-by: Hannes Reinecke
Signed-off-by: Mike Snitzer
Signed-off-by: James Bottomley

Hannes Reinecke
2011-02-13 00:33:08 +0800

22 Dec, 2010

1 commit

98db51957 [SCSI] fix id computation in scsi_eh_target_reset() ... Browse Code »

The current code in scsi_eh_target_reset() has an off by one error
that actually sends spurious extra resets. Since there's no real need
to reset the targets in numerical order, simply chunk up the command
recovery list doing target resets and pulling matching targets out of
the list (that also makes the loop O(N) instead of O(N^2).

[mike christie found and fixed a list_splice -> list_splice_init problem]

Reported-by: Hillf Danton
Signed-off-by: James Bottomley

James Bottomley
2010-12-22 02:23:56 +0800

09 Dec, 2010

1 commit

459dbf72e [SCSI] Eliminate error handler overload of the SCSI serial number ... Browse Code »

The error handler is using the test cmd->serial_number == 0 in the
abort routines to signal that the command to be aborted has already
completed normally. This design was to close a race window in the
original error handler where a command could go through the normal
completion routines after it timed out but before error handling was
started.

Mike Anderson pointed out that when we converted our timeout and
softirq completions, we picked up atomicity here because the block
layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
that *either* the command times out or our done routine is called, but
ensures we can't get both occurring. That makes the serial number
zero check redundant and it can be removed.

Signed-off-by: James Bottomley

James Bottomley
2010-12-09 23:41:16 +0800

17 Nov, 2010

1 commit

f281233d3 SCSI host lock push-down ... Browse Code »

Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.

The patch below presents a simple SCSI host lock push-down as an
equivalent transformation. No locking or other behavior should change
with this patch. All existing bugs and locking orders are preserved.

Additionally, add one parameter to queuecommand,
struct Scsi_Host *
and remove one parameter from queuecommand,
void (*done)(struct scsi_cmnd *)

Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.

Minimal code disturbance was attempted with this change. Most drivers
needed only two one-line modifications for their host lock push-down.

Signed-off-by: Jeff Garzik
Acked-by: James Bottomley
Signed-off-by: Linus Torvalds

Jeff Garzik
2010-11-17 05:33:23 +0800

10 Nov, 2010

1 commit

02e031cbc block: remove REQ_HARDBARRIER ... Browse Code »

REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:

- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-11-10 21:54:09 +0800

15 Aug, 2010

1 commit

c29c08b59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (28 commits)
[SCSI] qla4xxx: fix compilation warning
[SCSI] make error handling more robust in the face of reservations
[SCSI] tgt: fix warning
[SCSI] drivers/message/fusion: Adjust confusing if indentation
[SCSI] Return NEEDS_RETRY for eh commands with status BUSY
[SCSI] ibmvfc: Driver version 1.0.9
[SCSI] ibmvfc: Fix terminate_rport_io
[SCSI] ibmvfc: Fix rport add/delete race resulting in oops
[SCSI] lpfc 8.3.16: Change LPFC driver version to 8.3.16
[SCSI] lpfc 8.3.16: FCoE Discovery and Failover Fixes
[SCSI] lpfc 8.3.16: SLI Additions, updates, and code cleanup
[SCSI] pm8001: introduce missing kfree
[SCSI] qla4xxx: Update driver version to 5.02.00-k3
[SCSI] qla4xxx: Added AER support for ISP82xx
[SCSI] qla4xxx: Handle outstanding mbx cmds on hung f/w scenarios
[SCSI] qla4xxx: updated mbx_sys_info struct to sync with FW 4.6.x
[SCSI] qla4xxx: clear AF_DPC_SCHEDULED flage when exit from do_dpc
[SCSI] qla4xxx: Stop firmware before doing init firmware.
[SCSI] qla4xxx: Use the correct request queue.
[SCSI] qla4xxx: set correct value in sess->recovery_tmo
...

Linus Torvalds
2010-08-15 03:34:34 +0800

11 Aug, 2010

3 commits

67110dfd4 [SCSI] make error handling more robust in the face of reservations ... Browse Code »

commit 5f91bb050ecc4ff1d8d3d07edbe550c8f431c5e1
Author: Michael Reed
Date: Mon Aug 10 11:59:28 2009 -0500

[SCSI] reservation conflict after timeout causes device to be taken offline

Flipped us from always returning failed to always returning success in
the name of fixing the problem where reservation conflict returns from
test unit ready cause the device always to be taken offline.
Unfortuantely, it also introduced a problem whereby for commands other
than test unit ready, the eh dispatcher thinks they succeeded when
reservation conflict is returned, whereas in reality they failed. Fix
this by only returning success for the test unit ready case.

Signed-off-by: James Bottomley

James Bottomley
2010-08-11 12:58:33 +0800
3eb3a9285 [SCSI] Return NEEDS_RETRY for eh commands with status BUSY ... Browse Code »

When the transport is busy and we're sending an EH command drivers
occasionally return 'BUSY'. As this in most cases is the TUR
command sent as part of the error recovery this is a sure way
to make the error recovery escalate. Returning 'NEEDS_RETRY'
here will just retry the TUR command and eventually abort the
original command, thus making error handling far smoother.

Signed-off-by: Hannes Reinecke
Signed-off-by: James Bottomley

Hannes Reinecke
2010-08-11 12:51:20 +0800
2f9e825d3 Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.

Linus Torvalds
2010-08-11 06:22:42 +0800

08 Aug, 2010

2 commits

e96f6abe0 scsi: use REQ_TYPE_FS for flush request ... Browse Code »

scsi-ml uses REQ_TYPE_BLOCK_PC for flush requests from file
systems. The definition of REQ_TYPE_BLOCK_PC is that we don't retry
requests even when we can (e.g. UNIT ATTENTION) and we send the
response to the callers (then the callers can decide what they want).
We need a workaround such as the commit
77a4229719e511a0d38d9c355317ae1469adeb54 to retry BLOCK_PC flush
requests. We will need the similar workaround for discard requests too
since SCSI-ml handle them as BLOCK_PC internally.

This uses REQ_TYPE_FS for flush requests from file systems instead of
REQ_TYPE_BLOCK_PC.

scsi-ml retries only REQ_TYPE_FS requests that have data to
transfer when we can retry them (e.g. UNIT_ATTENTION). However, we
also need to retry REQ_TYPE_FS requests without data because the
callers don't.

This also changes scsi_check_sense() to retry all the REQ_TYPE_FS
requests when appropriate. Thanks to scsi_noretry_cmd(),
REQ_TYPE_BLOCK_PC requests don't be retried as before.

Note that basically, this reverts the commit
77a4229719e511a0d38d9c355317ae1469adeb54 since now we use REQ_TYPE_FS
for flush requests.

Signed-off-by: FUJITA Tomonori
Signed-off-by: Jens Axboe

FUJITA Tomonori
2010-08-08 00:52:41 +0800
33659ebba block: remove wrappers for request type/flags ... Browse Code »

Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:17:56 +0800

28 Jul, 2010

2 commits

bc4f24014 [SCSI] implement runtime Power Management ... Browse Code »

This patch (as1398b) adds runtime PM support to the SCSI layer. Only
the machanism is provided; use of it is up to the various high-level
drivers, and the patch doesn't change any of them. Except for sg --
the patch expicitly prevents a device from being runtime-suspended
while its sg device file is open.

The implementation is simplistic. In general, hosts and targets are
automatically suspended when all their children are asleep, but for
them the runtime-suspend code doesn't actually do anything. (A host's
runtime PM status is propagated up the device tree, though, so a
runtime-PM-aware lower-level driver could power down the host adapter
hardware at the appropriate times.) There are comments indicating
where a transport class might be notified or some other hooks added.

LUNs are runtime-suspended by calling the drivers' existing suspend
handlers (and likewise for runtime-resume). Somewhat arbitrarily, the
implementation delays for 100 ms before suspending an eligible LUN.
This is because there typically are occasions during bootup when the
same device file is opened and closed several times in quick
succession.

The way this all works is that the SCSI core increments a device's
PM-usage count when it is registered. If a high-level driver does
nothing then the device will not be eligible for runtime-suspend
because of the elevated usage count. If a high-level driver wants to
use runtime PM then it can call scsi_autopm_put_device() in its probe
routine to decrement the usage count and scsi_autopm_get_device() in
its remove routine to restore the original count.

Hosts, targets, and LUNs are not suspended while they are being probed
or removed, or while the error handler is running. In fact, a fairly
large part of the patch consists of code to make sure that things
aren't suspended at such times.

[jejb: fix up compile issues in PM config variations]
Signed-off-by: Alan Stern
Signed-off-by: James Bottomley

Alan Stern
2010-07-28 22:07:50 +0800
6e49949c5 [SCSI] Log msg when getting Unit Attention ... Browse Code »

If the user accidentally changes LUN mappings or it occurs
due to a bug, then it can cause data corruption that can take
months and months to track down. This patch adds a log
message when getting REPORT_LUNS_DATA_CHANGED and it adds
a generic message for other Unit Attentions with asc == 0x3f.

We are working on adding support for handling of these errors,
but I think until then we should at least log a message so
tracking down problems as a result of one of these changes
is a little easier.

Signed-off-by: Mike Christie
Signed-off-by: James Bottomley

Mike Christie
2010-07-28 01:03:51 +0800

18 May, 2010

1 commit

95bb335c0 [SCSI] Merge scsi-misc-2.6 into scsi-rc-fixes-2.6 ... Browse Code »

Signed-off-by: James Bottomley

James Bottomley
2010-05-18 22:37:41 +0800

06 May, 2010

1 commit

77a422971 [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error ... Browse Code »

There's nastyness in the way we currently handle barriers (and
discards): They're effectively filesystem commands, but they get
processed as BLOCK_PC commands. Unfortunately BLOCK_PC commands are
taken by SCSI to be SG_IO commands and the issuer expects to see and
handle any returned errors, however trivial. This leads to a huge
problem, because the block layer doesn't expect this to happen and any
trivially retryable error on a barrier causes an immediate I/O error
to the filesystem.

The only real way to hack around this is to take the usual class of
offending errors (unit attentions) and make them all retryable in the
case of a REQ_HARDBARRIER. A correct fix would involve a rework of
the entire block and SCSI submit system, and so is out of scope for a
quick fix.

Cc: Hannes Reinecke
Cc: Stable Tree
Signed-off-by: James Bottomley

James Bottomley
2010-05-06 00:15:57 +0800

01 May, 2010

1 commit

bf8162354 [SCSI] add scsi trace core functions and put trace points ... Browse Code »

Signed-off-by: Xiao Guangrong
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Kei Tokunaga
Signed-off-by: James Bottomley

Kei Tokunaga
2010-05-01 01:51:10 +0800

11 Apr, 2010

1 commit

2f2eb5876 [SCSI] Allow FC LLD to fast-fail scsi eh by introducing new eh return ... Browse Code »

If the scsi eh is running and then a FC LLD calls
fc_remote_port_delete, the SCSI commands sent from the eh will fail.
To prevent this, a FC LLD can call fc_block_scsi_eh from the eh
callback, blocking the eh thread until the dev_loss_tmo fires or the
remote port is available again.

If (e.g. for a multipathing setup) the dev_loss_tmo is set to a very
large value, thus preventing the scsi device removal , the scsi eh can
block for a long time. For multipathing, the fast_io_fail_tmo is then
set to a low value to detect path problems sooner.

This patch introduces a new return code FAST_IO_FAIL. The function
fc_block_scsi_eh now returns FAST_IO_FAIL when the fast_io_fail_tmo
fires. This indicates that the LLD terminated all pending I/O requests
and there are no more pending SCSI commands for the scsi eh to wait
for. This return code can be passed back to the scsi eh to stop the
escalation and finish the recovery process for this device.

Signed-off-by: Christof Schmitt
Signed-off-by: James Bottomley

Christof Schmitt
2010-04-11 22:49:33 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

05 Dec, 2009

2 commits

4a84067db [SCSI] add queue_depth ramp up code ... Browse Code »

Current FC HBA queue_depth ramp up code depends on last queue
full time. The sdev already has last_queue_full_time field to
track last queue full time but stored value is truncated by
last four bits.

So this patch updates last_queue_full_time without truncating
last 4 bits to store full value and then updates its only
current usages in scsi_track_queue_full to ignore last four bits
to keep current usages same while also use this field
in added ramp up code.

Adds scsi_handle_queue_ramp_up to ramp up queue_depth on
successful completion of IO. The scsi_handle_queue_ramp_up will
do ramp up on all luns of a target, just same as ramp down done
on all luns on a target.

The ramp up is skipped in case the change_queue_depth is not
supported by LLD or already reached to added max_queue_depth.

Updates added max_queue_depth on every new update to default
queue_depth value.

The ramp up is also skipped if lapsed time since either last
queue ramp up or down is less than LLD specified
queue_ramp_up_period.

Adds queue_ramp_up_period to sysfs but only if change_queue_depth
is supported since ramp up and queue_ramp_up_period is needed only
in case change_queue_depth is supported first.

Initializes queue_ramp_up_period to 120HZ jiffies as initial
default value, it is same as used in existing lpfc and qla2xxx.

-v2
Combined all ramp code into this single patch.

-v3
Moves max_queue_depth initialization after slave_configure is
called from after slave_alloc calling done. Also adjusted
max_queue_depth check to skip ramp up if current queue_depth
is >= max_queue_depth.

-v4
Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f
to store or show its value.

Signed-off-by: Vasu Dev
Tested-by: Christof Schmitt
Tested-by: Giridhar Malavali
Signed-off-by: James Bottomley

Vasu Dev
2009-12-05 02:00:44 +0800
42a6a9183 [SCSI] scsi error: have scsi-ml call change_queue_depth to handle QUEUE_FULL ... Browse Code »

This has scsi-ml call the change_queue_depth functions when
we get a QUEUE_FULL. It will only change the queue depth if
change_queue_depth is set because the LLD may have to
modify some internal resources, so I thought this would
be the safest route.

Signed-off-by: Mike Christie

-v2
Limits change_queue_depth to only all luns of target by adding
channel check while iterating for all luns of Scsi_Host. This is
same as currently qla2xxx FC HBA does on QUEUE_FULL event.

Signed-off-by: Vasu Dev
Signed-off-by: James Bottomley

Mike Christie
2009-12-05 02:00:42 +0800

02 Oct, 2009

1 commit

6e883b0e4 [SCSI] Retry ADD_TO_MLQUEUE return value for EH commands ... Browse Code »

A target reset when I/O is ongoing might result
an eventual device offline, as scsi_eh_completed_normally()
might return ADD_TO_MLQUEUE in addition to the
advertised SUCCESS, FAILED, and NEEDS_RETRY.

Which is unfortunate as scsi_send_eh_cmnd() will
therefore map ADD_TO_MLQUEUE to FAILED instead of
the more appropriate NEEDS_RETRY.

Signed-off-by: Hannes Reinecke
Cc: Stable Tree
Signed-off-by: James Bottomley

Hannes Reinecke
2009-10-02 22:46:11 +0800

23 Aug, 2009

1 commit

5f91bb050 [SCSI] reservation conflict after timeout causes device to be taken offline ... Browse Code »

An IBM tape drive failed to complete a PERSISTENT RESERVE IN within the scsi
cmd timeout. Error recovery was initiated and it sequenced from abort through
taking the tape drive offline.

The device was taken offline because it repeatedly responded to the TUR command
issued by error recovery with a RESERVATION CONFLICT status. The tape drive
was reserved to another system. This is perfectly legitimate response to TUR,
and is one that an escalation of recovery is unlikely to clear. Further,
escalation of recovery can have undesirable side effects on the operation of
tape drives shared with other initiators.

Instead of escalating recovery, error recovery should treat the RESERVATION
CONFLICT response to the TUR as a good status, giving the issuer of the
command the opportunity to handle the timeout and reservation conflict.

Signed-off-by: Michael reed
Signed-off-by: James Bottomley

Michael Reed
2009-08-23 06:52:22 +0800

09 Jun, 2009

2 commits

91bc31fb3 [SCSI] fix up scsi_eh_lock_door() ... Browse Code »

The Documentation is incorrect (we removed some functions referred to), and
none of the bug warnings now apply. Additionally remove the spurious check on
the return from blk_get_request() which can't fail if __GFP_WAIT is passed in.

Signed-off-by: James Bottomley

James Bottomley
2009-06-09 01:47:40 +0800
477e608c0 [SCSI] fix documentation for two functions ... Browse Code »

Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: James Bottomley

Bartlomiej Zolnierkiewicz
2009-06-09 01:23:35 +0800

13 Mar, 2009

1 commit

f078727b2 [SCSI] remove scsi_req_map_sg ... Browse Code »

No one uses scsi_execute_async with data transfer now. We can remove
scsi_req_map_sg.

Only scsi_eh_lock_door uses scsi_execute_async. scsi_eh_lock_door
doesn't handle sense and the callback. So we can remove
scsi_io_context too.

Signed-off-by: FUJITA Tomonori
Signed-off-by: James Bottomley

FUJITA Tomonori
2009-03-13 01:58:10 +0800