08 Aug, 2010
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
02 Aug, 2010
1 commit
-
Signed-off-by: FUJITA Tomonori
Signed-off-by: Jeff Garzik
02 Jul, 2010
1 commit
-
libata has two concurrency related limitations.
a. ata_wq which is used for polling PIO has single thread per CPU. If
there are multiple devices doing polling PIO on the same CPU, they
can't be executed simultaneously.b. ata_aux_wq which is used for SCSI probing has single thread. In
cases where SCSI probing is stalled for extended period of time
which is possible for ATAPI devices, this will stall all probing.#a is solved by increasing maximum concurrency of ata_wq. Please note
that polling PIO might be used under allocation path and thus needs to
be served by a separate wq with a rescuer.#b is solved by using the default wq instead and achieving exclusion
via per-port mutex.Signed-off-by: Tejun Heo
Acked-by: Jeff Garzik
20 May, 2010
2 commits
-
Some of error handling logic in ata_sff_error_handler() and all of
ata_sff_post_internal_cmd() are for BMDMA. Create
ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move
BMDMA part into those.While at it, change DMA protocol check to ata_is_dma(), fix
post_internal_cmd to call ap->ops->bmdma_stop instead of directly
calling ata_bmdma_stop() and open code hardreset selection so that
ata_std_error_handler() doesn't have to know about sff hardreset.As these two functions are BMDMA specific, there's no reason to check
for bmdma_addr before calling bmdma methods if the protocol of the
failed command is DMA. sata_mv and pata_mpc52xx now don't need to set
.post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't
need to set it to their bmdma_stop routines.ata_sff_post_internal_cmd() becomes noop and is removed.
This fixes p3 described in clean-up-BMDMA-initialization patch.
Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
port_task is tightly bound to the standard SFF PIO HSM implementation.
Using it for any other purpose would be error-prone and there's no
such user and if some drivers need such feature, it would be much
better off using its own. Move it inside CONFIG_ATA_SFF and rename it
to sff_pio_task.The only function which is exposed to the core layer is
ata_sff_flush_pio_task() which is renamed from ata_port_flush_task()
and now also takes care of resetting hsm_task_state to HSM_ST_IDLE,
which is possible as it's now specific to PIO HSM.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik
23 Apr, 2010
2 commits
-
before returning it via qc->result_tf.
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik -
blk_abort_request() expectes queue lock to be held by the caller.
Grab it before calling the function.Lack of this synchronization led to infinite loop on corrupt
q->timeout_list.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik
21 Jan, 2010
1 commit
-
libata currently doesn't retry if a command fails with AC_ERR_INVALID
assuming that retrying won't get it any further even if retried.
However, a failure may be classified as invalid through hardware
glitch (incorrect reading of the error register or firmware bug) and
there isn't whole lot to gain by not retrying as actually invalid
commands will be failed immediately. Also, commands serving FS IOs
are extremely unlikely to be invalid. Retry FS IOs even if it's
marked invalid.Transient and incorrect invalid failure was seen while debugging
firmware related issue on Samsung n130 on bko#14314.http://bugzilla.kernel.org/show_bug.cgi?id=14314
Signed-off-by: Tejun Heo
Reported-by: Johannes Stezenbach
Signed-off-by: Jeff Garzik
03 Dec, 2009
1 commit
-
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination. Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode. This was reported in the following bugzilla bug.http://bugzilla.kernel.org/show_bug.cgi?id=14543
This patch makes libata EH retry FLUSH if it wasn't failed by the
device.Signed-off-by: Tejun Heo
Reported-by: Andrey Vihrov
Signed-off-by: Jeff Garzik
16 Oct, 2009
1 commit
-
Commit 842faa6c1a1d6faddf3377948e5cf214812c6c90 fixed error handling
during attach by not committing detected device class to dev->class
while attaching a new device. However, this change missed the PMP
class check in the configuration loop causing a new PMP device to go
through ata_dev_configure() as if it were an ATA or ATAPI device.As PMP device doesn't have a regular IDENTIFY data, this makes
ata_dev_configure() tries to configure a PMP device using an invalid
data. For the most part, it wasn't too harmful and went unnoticed but
this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
sata_pmp_attach(). This means that SATA_PMP_FEAT_NOTIFY ends up being
disabled on PMPs and on PMPs which honor the flag breaks hotplug
support.This problem was discovered and reported by Ethan Hsiao.
Signed-off-by: Tejun Heo
Reported-by: Ethan Hsiao
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik
07 Oct, 2009
1 commit
-
While trying to work around spurious detection retries for
non-existent devices on slave links, commit
816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
offline check logic before ata_eh_thaw() was called. This means that
if an occupied link goes down briefly at the time that offline check
was performed, device class will be cleared to ATA_DEV_NONE and libata
wouldn't retry thus failing detection of the device.The offline check should be done after the port is thawed together
with online check so that such link glitches can be detected by the
interrupt handler and handled properly.Signed-off-by: Tejun Heo
Reported-by: Tim Blechmann
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik
02 Sep, 2009
3 commits
-
This patch improve libata's output for error/notification messages
to allow easier comprehension and debugging:When ATAPI commands issued through the SCSI layer fail, use SCSI
functions to print the CDB in human-readable form instead of just
dumping out the CDB in hex.Print out the name of the failed command (as defined by the ATA
specification) in error handling output along with the raw register
contents.When reporting status of ACPI taskfile commands executed on resume,
also output the names of the commands being executed (or not) in
readable form.Since the extra data for printing command names increases kernel
size slightly, a config option has been added to allow disabling
command name output (as well as some of the error register parsing)
for those highly sensitive to kernel text size.Signed-off-by: Robert Hancock
Signed-off-by: Jeff Garzik -
Resets are done with port frozen but some controllers still issue
interrupts during reset and they may end up recording error conditions
in ehi leading to unnecessary EH retrials.This patch makes ata_eh_reset() clear ehi on reset completion. As
reset is the most severe recovery action, there's nothing to lose by
clearing ehi on its completion.Signed-off-by: Tejun Heo
Reported-by: Zdenek Kaspar
Signed-off-by: Jeff Garzik -
Call the ->freeze() hook before aborting qc's, because some hardware
requires special handling prior to accessing the taskfile registers
(for diagnosis/analysis/reset). Most notably, hardware may wish to
disable the DMA engine or interrupts in the ->freeze() hook.Signed-off-by: Jeff Garzik
29 Jul, 2009
1 commit
-
drivers/ata/libata-eh.c +2403 ata_eh_reset(80) warning: variable derefenced before check 'slave'
Please note that this is _not_ a real bug at the moment since ata_eh_context
structure is embedded into ata_list structure and the code alwas checks for
'slave' before accessing 'sehc'.Anyway lets add missing check and always have a valid 'sehc' pointer (which
makes code easier to understand and prevents introducing some possible bugs
in the future).Reported-by: Dan Carpenter
Cc: corbet@lwn.net
Cc: eteo@redhat.com
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Jeff Garzik
15 Jul, 2009
1 commit
-
ata_eh_reset() was missing error return handling after follow-up SRST
allowing EH to continue the normal probing path after reset failure.
This was discovered while testing new WD 2TB drives which take longer
than 10 secs to spin up and cause the first follow-up SRST to time
out.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik
13 Jun, 2009
1 commit
-
Signed-off-by: Martin Olsson
Signed-off-by: Jiri Kosina
12 May, 2009
2 commits
-
Error timestamps are in jiffies which doesn't run while suspended and
PHY events during resume isn't too uncommon. When the two are
combined, it can lead to unnecessary speed downs if the machine is
suspended and resumed repeatedly. Clear error history on resume.This was reported and verified in bnc#486803 by Vladimir Botka.
Signed-off-by: Tejun Heo
Reported-by: Vladimir Botka
Signed-off-by: Jeff Garzik -
New device attach path in ata_eh_revalidate_and_attach() is divided
into two separate loops because ATA requires IDENTIFY to be issued to
slave first while the user expects to see device probe messages from
the master device. new_mask is used to track which devices are the
new ones between the first loop and the second.This usually works well but if an error occurs during configuration
stage, ata_dev_revalidate_and_attach() returns with error code and
forgets new_mask. On the retry run, dev->class is set and new_mask
for the device is clear, so the device just gets revalidated and thus
ends up skipping post-configuration procedure including scheduling of
SCSI_HOTPLUG for the device. When this occurs, ATA part of probing
works fine but SCSI probing usually doesn't happen and makes the
device unreachable.The behavior has been around for a very long time but it has been
uncovered with the recent addition of 1_5_GBPS horkage which uses
-EAGAIN return value from ata_dev_configure() to restart the probing
sequence after forcing cable speed.This can be fixed by making sure dev->class is permanently set only
after all configurations are successfully complete. Fix it.Signed-off-by: Tejun Heo
Reported-by: Tim Connors
Signed-off-by: Jeff Garzik
25 Mar, 2009
1 commit
-
On a timeout call a device specific handler early in the recovery so that
we can complete and process successful commands which timed out due to IRQ
loss or the like rather more elegantly.[Revised to exclude the timeout handling on a few devices that inherit from
SFF but are not SFF enough to use the default timeout handler]Signed-off-by: Alan Cox
Signed-off-by: Jeff Garzik
05 Mar, 2009
2 commits
-
When SCR access is available and the link is offline, softreset is
skipped as it only wastes time and some controllers don't respond very
well. However, the skip path forgot to thaw the port, which not only
blocks further event notification from the port but also causes
repeated EH invocations on the same event on drivers which rely on
->thaw() to clear events if the IRQ is shared with another device or
port.This problem has always been there but is uncovered by recent sata_nv
nf2/3 change which dropped hardreset support while maintaining SCR
access. nf2/3 doesn't clear hotplug event mask from the interrupt
handler but relies on ->thaw() to clear them. When the hardreset was
there, the reset action was never skipped and the port was always
thawed but, with the hardreset gone, ->prereset() determines that
there's no need for softreset and both ->softreset() and ->thaw() are
skipped. This leads to stuck hotplug event in the IRQ status register
triggering hotplug event whenever IRQ is delieverd on the same IRQ.
As the controller shares the same IRQ for both ports, this happens on
every IO if one port is occpupied and the other isn't.This patch fixes the problem by making sure that the port is thawed on
reset-skip path.bko#11615 reports this problem.
Signed-off-by: Tejun Heo
Cc: Robert Hancock
Reported-by: Dan Andresan
Reported-by: Arne Woerner
Reported-by: Stefan Lippers-Hollmann
Signed-off-by: Jeff Garzik -
sense_buffer is used as DMA target and shouldn't be allocated on
stack. Use ap->sector_buf instead. This problem is spotted by Chuck
Ebbert.Signed-off-by: Tejun Heo
Reported-by: Chuck Ebbert
Signed-off-by: Jeff Garzik
03 Feb, 2009
6 commits
-
Let -EAGAIN from EH device handling routines trigger EH retry without
consuming its tries count. This will be used to implement link SPD
horkage which requires hardreset to adjust SPD without affecting other
EH decisions. As it bypasses the forward progress guarantee provided
by the tries count, the requester is responsible for ensuring forward
progress.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
When link is flaky at high speed, it isn't uncommon for a device to
repeatedly fail probing sequence early after successfully negotiating
high link speed. This often leads to consecutive hotplug events
without successful probing.This patch improves libata EH such that it remembers probing trials
and if there have been more than two unsuccessful trials in the past
60 seconds, slows down link speed to 1.5Gbps.As link speed negotiation is the duty of the PHY layer proper, the
goal of this fallback mechanism is to provide the last resort when
everything else fails, which unfortunately happens not too
infrequently, so no fancy 6->3->1.5 speeding down or highest
successful transmission speed seen kind of logics (yet).Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
Add @spd_limit to sata_down_spd_limit() so that the caller can specify
the SPD limit it wants. This parameter doesn't get in the way even
when it's too low. The closest possible limit is applied.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
dev->ering used to be cleared together with the rest of ata_device in
ata_dev_init() which is called whenever a probing event occurs.
dev->ering is about to be used to track probing failures so it needs
to remain persistent over multiple porbing events. This patch
achieves this by doing the following.* Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
clear between BEGIN and END. ering is moved after END. The split
of persistent area is to allow hotter items remain at the head.* ering is explicitly cleared on ata_dev_disable() and when device
attach succeeds. So, ering is persistent throug a device's life
time (unless explicitly cleared of course) and also through periods
inbetween disablement of an attached device and successful detection
of the next one.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
ata_dev_disable() is about to be more tightly integrated into EH
logic. Move it to libata-eh.c.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik
29 Dec, 2008
2 commits
-
ata_port_detach() first made sure EH saw ATA_PFLAG_UNLOADING and then
assumed EH context belongs to it and performed detach operation
itself. However, UNLOADING doesn't disable all of EH and this could
lead to problems including triggering WARN_ON()'s in EH path.This patch makes port detach behave more like other EH actions such
that ata_port_detach() requests EH to detach and waits for completion.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
There currently are the following looping constructs.
* __ata_port_for_each_link() for all available links
* ata_port_for_each_link() for edge links
* ata_link_for_each_dev() for all devices
* ata_link_for_each_dev_reverse() for all devices in reverse orderNow there's a need for looping construct which is similar to
__ata_port_for_each_link() but iterates over PMP links before the host
link. Instead of adding another one with long name, do the following
cleanup.* Implement and export ata_link_next() and ata_dev_next() which take
@mode parameter and can be used to build custom loop.
* Implement ata_for_each_link() and ata_for_each_dev() which take
looping mode explicitly.The following iteration modes are implemented.
* ATA_LITER_EDGE : loop over edge links
* ATA_LITER_HOST_FIRST : loop over all links, host link first
* ATA_LITER_PMP_FIRST : loop over all links, PMP links first* ATA_DITER_ENABLED : loop over enabled devices
* ATA_DITER_ENABLED_REVERSE : loop over enabled devices in reverse order
* ATA_DITER_ALL : loop over all devices
* ATA_DITER_ALL_REVERSE : loop over all devices in reverse orderThis change removes exlicit device enabledness checks from many loops
and makes it clear which ones are iterated over in which direction.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik
11 Nov, 2008
1 commit
-
ehc->last_reset is used to ensure that resets are not issued too
close to each other. It's initialized to jiffies minus one minute
on EH entry. However, when new links are initialized after PMP is
probed, new links have zero for this timestamp resulting in long wait
depending on the current jiffies.This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in
which case last_reset is always initialized. As an added precaution,
WARN_ON() is added so that warning is printed if last_reset is
in future.This problem is spotted and debugged by Shane Huang.
Signed-off-by: Tejun Heo
Cc: Shane Huang
Signed-off-by: Jeff Garzik
28 Oct, 2008
2 commits
-
libata EH saves xfer_mode and ncq_enabled at start to later set
DUBIOUS_XFER flag if it has changed. These values need to be cleared
on device detach such that hot device swap doesn't accidentally miss
DUBIOUS_XFER.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
There were several places where only enabled devices should be
iterated over but device enabledness wasn't checked.* IDENTIFY data 40 wire check in cable_is_40wire()
* xfer_mode/ncq_enabled saving in ata_scsi_error()
* DUBIOUS_XFER handling in ata_set_mode()While at it, reformat comments in cable_is_40wire().
Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik
23 Oct, 2008
3 commits
-
Reset methods don't have access to phys link status for slave links
and may incorrectly indicate device presence causing unnecessary probe
failures for unoccupied links. This patch clears device class to NONE
during post-reset processing if phys link is offline.As on/offlineness semantics is strictly defined and used in multiple
places by the core layer, this won't change behavior for drivers which
don't use slave links.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
Slave link action mask is transferred to master link and all the EH
actions are taken by the master link. ata_eh_about_to_do() and
ata_eh_done() are called with ATA_EH_ALL_ACTIONS to clear the slave
link actions during transfer. This always sets ATA_PFLAG_RECOVERED
flag causing spurious "EH complete" messages.Don't set ATA_PFLAG_RECOVERED for slave link actions.
Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used to control the behavior
of EH. As only the master link is visible outside EH, these flags are
set only for the master link although they should also apply to the
slave link, which causes spurious EH messages during probe and
suspend/resume.This patch transfers those two flags to slave ehc.i before performing
slave autopsy and reporting.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik
11 Oct, 2008
1 commit
-
* 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block: (132 commits)
doc/cdrom: Trvial documentation error, file not present
block_dev: fix kernel-doc in new functions
block: add some comments around the bio read-write flags
block: mark bio_split_pool static
block: Find bio sector offset given idx and offset
block: gendisk integrity wrapper
block: Switch blk_integrity_compare from bdev to gendisk
block: Fix double put in blk_integrity_unregister
block: Introduce integrity data ownership flag
block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1
bio.h: Remove unused conditional code
block: remove end_{queued|dequeued}_request()
block: change elevator to use __blk_end_request()
gdrom: change to use __blk_end_request()
memstick: change to use __blk_end_request()
virtio_blk: change to use __blk_end_request()
blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure
block: add lld busy state exporting interface
block: Fix blk_start_queueing() to not kick a stopped queue
include blktrace_api.h in headers_install
...
09 Oct, 2008
1 commit
-
Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.Signed-off-by: Mike Anderson
Signed-off-by: Jens Axboe
29 Sep, 2008
2 commits
-
Resets make ATAPI devices raise UNIT ATTENTION which fails the next
command. As resets can happen asynchronously for unrelated reasons,
this sometimes disrupts innocent users. For example, reading DVD
fails after the system wakes up from suspend or the other device
sharing the channel went through bus error.Clearing UA has some problems as it might clear UA which the userland
needs to know about. However, UA after resets can only be about the
reset itself and benefits of clearing it overweights cons. Missing UA
can only delay failure to one of the following commands anyway. For
example, timeout while burning is in progress will trigger reset and
reset the device state and probably corrupt the burning run. Although
the userland application won't get the UA, its pending writes will
fail.Signed-off-by: Tejun Heo
Signed-off-by: Jeff Garzik -
On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.Signed-off-by: Elias Oltmanns
Signed-off-by: Jeff Garzik