09 Nov, 2011

1 commit

  • Reenable sending SRST to devices connected behind a Sil3726 PMP.
    This allow staggered spinups and handles drives that spins up slowly.

    While the drives spin up, the PMP will not accept SRST.
    Most controller reissues the reset until the drive is ready, while
    some [Sil3124] returns an error.
    In ata_eh_error, wait 10s before reset the ATA port and try again.

    Signed-off-by: Gwendal Grignou
    Acked-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Gwendal Grignou
     

01 Nov, 2011

1 commit


08 Oct, 2011

1 commit


24 Jul, 2011

2 commits

  • libata EH intentionally left a port frozen if it failed
    ata_eh_reset(). The intention was avoiding continuous loop of resets
    when the controller or attached device is flaky and reporting spurious
    hotplug events. Once port enters this state, it can be recovered with
    manual rescan, which seemed reasonable.

    However, outside of my convoluted test setup, there have been very few
    reports justifying this choice while there have been more cases where
    the automatic freezing of the port after hotplug attempt of a faulty
    device caused confusion and led to unnecessary resets.

    This patch changes the behavior so that the port is thawed after reset
    failure. This change doesn't necessarily solve but makes it easier
    and more intuitive to work around hotplug related problems
    (ie. re-pluggin or power cycling the device) as reported in the
    followings.

    https://bugzilla.kernel.org/show_bug.cgi?id=34712
    http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548

    Signed-off-by: Tejun Heo
    Reported-by: Reartes Guillermo
    Reported-by: Bruce Stenning
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Saves text by removing nearly duplicated text format strings by
    creating ata__printk functions and printf extension %pV.

    ata defconfig size shrinks ~5% (~8KB), allyesconfig ~2.5% (~13KB)

    Format string duplication comes from:

    #define ata_link_printk(link, lv, fmt, args...) do { \
    if (sata_pmp_attached((link)->ap) || (link)->ap->slave_link) \
    printk("%sata%u.%02u: "fmt, lv, (link)->ap->print_id, \
    (link)->pmp , ##args); \
    else \
    printk("%sata%u: "fmt, lv, (link)->ap->print_id , ##args); \
    } while(0)

    Coalesce long formats.

    $ size drivers/ata/built-in.*
    text data bss dec hex filename
    544969 73893 116584 735446 b38d6 drivers/ata/built-in.allyesconfig.ata.o
    558429 73893 117864 750186 b726a drivers/ata/built-in.allyesconfig.dev_level.o
    141328 14689 4220 160237 271ed drivers/ata/built-in.defconfig.ata.o
    149567 14689 4220 168476 2921c drivers/ata/built-in.defconfig.dev_level.o

    Signed-off-by: Joe Perches
    Signed-off-by: Jeff Garzik

    Joe Perches
     

08 Jun, 2011

1 commit

  • To work around controllers which can't properly plug events while
    reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING
    after reset but before RESET is marked done. As reset is the final
    recovery action and full verification of devices including onlineness
    and classfication match is done afterwards, this shouldn't lead to
    lost devices or missed hotplug events.

    Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so
    if the condition happens after resetting an empty port, the port could
    be left frozen and EH will end without thawing it, making the port
    unresponsive to further hotplug events.

    Thaw if the port is frozen after clearing EH_PENDING. This problem is
    reported by Bruce Stenning in the following thread.

    http://thread.gmane.org/gmane.linux.kernel/1123265

    stable: I think we should weather this patch a bit longer in -rcX
    before sending it to -stable. Please wait at least a month
    after this patch makes upstream. Thanks.

    -v2: Fixed spelling in the comment per Dave Howorth.

    Signed-off-by: Tejun Heo
    Reported-by: Bruce Stenning
    Cc: stable@kernel.org
    Cc: Dave Howorth
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

20 May, 2011

1 commit

  • Give users the option of completely powering off unoccupied
    SATA ports using the existing min_power link_power_management_policy
    option. When the use selects this option on an empty port, we
    will power the port off by setting DET to off. For occupied ports,
    behavior is unchanged.

    Signed-off-by: Kristen Carlson Accardi
    Signed-off-by: Jeff Garzik

    Kristen Carlson Accardi
     

15 May, 2011

1 commit

  • ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65)
    added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag.
    However, @ap is NULL if @link points to a PMP link and thus the
    unconditional @ap->flags dereference leads to the following oops.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    IP: [] ata_eh_recover+0x9a1/0x1510
    ...
    Pid: 295, comm: scsi_eh_4 Tainted: P 2.6.38.5-core2 #1 System76, Inc. Serval Professional/Serval Professional
    RIP: 0010:[] [] ata_eh_recover+0x9a1/0x1510
    RSP: 0018:ffff880132defbf0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000
    RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000
    RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98
    R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
    R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0)
    Stack:
    0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738
    ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190
    ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000
    Call Trace:
    [] sata_pmp_error_handler+0x607/0xc30
    [] ahci_error_handler+0x1f/0x70 [libahci]
    [] ata_scsi_error+0x5be/0x900
    [] scsi_error_handler+0x124/0x650
    [] kthread+0x96/0xa0
    [] kernel_thread_helper+0x4/0x10
    Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83
    RIP [] ata_eh_recover+0x9a1/0x1510
    RSP
    CR2: 0000000000000018

    Fix it by testing @link->ap->flags instead.

    stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was
    backported to 2.6.37 and 38. This is a fix for that and thus
    also applicable to 2.6.37 and 38.

    Signed-off-by: Tejun Heo
    Reported-by: "Nathan A. Mourey II"
    LKML-Reference:
    Cc: Connor H
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

24 Apr, 2011

1 commit

  • NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
    is used. Implement ATA_FLAG_NO_DIPM and apply it.

    This problem was reported by Stefan Bader in the following thread.

    http://thread.gmane.org/gmane.linux.ide/48841

    stable: applicable to 2.6.37 and 38.

    Signed-off-by: Tejun Heo
    Reported-by: Stefan Bader
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

31 Mar, 2011

1 commit


02 Mar, 2011

3 commits

  • Right at the moment, the libata error handler is incredibly
    monolithic. This makes it impossible to use from composite drivers
    like libsas and ipr which have to handle error themselves in the first
    instance.

    The essence of the change is to split the monolithic error handler
    into two components: one which handles a queue of ata commands for
    processing and the other which handles the back end of readying a
    port. This allows the upper error handler fine grained control in
    calling libsas functions (and making sure they only get called for ATA
    commands whose lower errors have been fixed up).

    Signed-off-by: James Bottomley
    Signed-off-by: Jeff Garzik

    James Bottomley
     
  • The SCSI host eh_cmd_q should be protected by the host lock (not the
    port lock). This probably doesn't matter that much at the moment,
    since we try to serialise the add and eh pieces, but it might matter
    in future for more convenient error handling. Plus this switches
    libata to the standard eh pattern where you lock, remove from the cmd
    queue to a local list and unlock and then operate on the local list.

    Signed-off-by: James Bottomley
    Signed-off-by: Jeff Garzik

    James Bottomley
     
  • ata_eh_analyze_serror() suppresses hotplug notifications if LPM is
    being used because LPM generates spurious hotplug events. It compared
    whether link->lpm_policy was different from ATA_LPM_MAX_POWER to
    determine whether LPM is enabled; however, this is incorrect as for
    drivers which don't implement LPM, lpm_policy is always
    ATA_LPM_UNKNOWN. This disabled hotplug detection for all drivers
    which don't implement LPM.

    Fix it by comparing whether lpm_policy is greater than
    ATA_LPM_MAX_POWER.

    Signed-off-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

25 Dec, 2010

1 commit

  • Low level drivers may behave differently depending on the current
    link->lpm_policy. During ata_eh_set_lpm(), DIPM enable commands are
    issued after the successful completion of ap->ops->set_lpm(), which
    means that the controller is already in the target state. This causes
    DIPM enable commands to be processed with mismatching controller power
    state and link->lpm_policy value.

    In ahci, link->lpm_policy is used to ignore certain PHY events if LPM
    is enabled; however, as DIPM commands are issued with stale
    link->lpm_policy, they sometimes end up triggering these conditions
    and get aborted leading to LPM configuration failure.

    Fix it by updating link->lpm_policy before issuing DIPM enable
    commands.

    Signed-off-by: Tejun Heo
    Reported-by: Kyle McMartin
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

22 Oct, 2010

6 commits

  • In libata, the non-EH code paths should always take and release
    ap->lock explicitly when accessing hardware or shared data structures.
    However, once EH is active, it's assumed that the port is owned by EH
    and EH methods don't explicitly take ap->lock unless race from irq
    handler or other code paths are expected. However, libata EH didn't
    guarantee exclusion among EHs for ports of the same host. IOW,
    multiple EHs may execute in parallel on multiple ports of the same
    controller.

    In many cases, especially in SATA, the ports are completely
    independent of each other and this doesn't cause problems; however,
    there are cases where different ports share the same resource, which
    lead to obscure timing related bugs such as the one fixed by commit
    213373cf (ata_piix: fix locking around SIDPR access).

    This patch implements exclusion among EHs of the same host. When EH
    begins, it acquires per-host EH ownership by calling ata_eh_acquire().
    When EH finishes, the ownership is released by calling
    ata_eh_release(). EH ownership is also released whenever the EH
    thread goes to sleep from ata_msleep() or explicitly and reacquired
    after waking up.

    This ensures that while EH is actively accessing the hardware, it has
    exclusive access to it while allowing EHs to interleave and progress
    in parallel as they hit waiting stages, which dominate the time spent
    in EH. This achieves cross-port EH exclusion without pervasive and
    fragile changes while still allowing parallel EH for the most part.

    This was first reported by yuanding02@gmail.com more than three years
    ago in the following bugzilla. :-)

    https://bugzilla.kernel.org/show_bug.cgi?id=8223

    Signed-off-by: Tejun Heo
    Cc: Alan Cox
    Reported-by: yuanding02@gmail.com
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Add optional @ap argument to ata_wait_register() and replace msleep()
    calls with ata_msleep() which take optional @ap in addition to the
    duration. These will be used to implement EH exclusion.

    This patch doesn't cause any behavior difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Port multipliers can do DIPM on fan-out links fine. Implement support
    for it. Tested w/ SIMG 57xx and marvell PMPs. Both the host and
    fan-out links enter power save modes nicely.

    SIMG 37xx and 47xx report link offline on SStatus causing EH to detach
    the devices. Blacklisted.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • The current LPM implementation has the following issues.

    * Operation order isn't well thought-out. e.g. HIPM should be
    configured after IPM in SControl is properly configured. Not the
    other way around.

    * Suspend/resume paths call ata_lpm_enable/disable() which must only
    be called from EH context directly. Also, ata_lpm_enable/disable()
    were called whether LPM was in use or not.

    * Implementation is per-port when it should be per-link. As a result,
    it can't be used for controllers with slave links or PMP.

    * LPM state isn't managed consistently. After a link reset for
    whatever reason including suspend/resume the actual LPM state would
    be reset leaving ap->lpm_policy inconsistent.

    * Generic/driver-specific logic boundary isn't clear. Currently,
    libahci has to mangle stuff which libata EH proper should be
    handling. This makes the implementation unnecessarily complex and
    fragile.

    * Tied to ALPM. Doesn't consider DIPM only cases and doesn't check
    whether the device allows HIPM.

    * Error handling isn't implemented.

    Given the extent of mismatch with the rest of libata, I don't think
    trying to fix it piecewise makes much sense. This patch reimplements
    LPM support.

    * The new implementation is per-link. The target policy is still
    port-wide (ap->target_lpm_policy) but all the mechanisms and states
    are per-link and integrate well with the rest of link abstraction
    and can work with slave and PMP links.

    * Core EH has proper control of LPM state. LPM state is reconfigured
    when and only when reconfiguration is necessary. It makes sure that
    LPM state is reset when probing for new device on the link.
    Controller agnostic logic is now implemented in libata EH proper and
    driver implementation only has to deal with controller specifics.

    * Proper error handling. LPM config failure is attributed to the
    device on the link and LPM is disabled for the link if it fails
    repeatedly.

    * ops->enable/disable_pm() are replaced with single ops->set_lpm()
    which takes @policy and @hints. This simplifies driver specific
    implementation.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Link power management related symbols are in confusing state w/ mixed
    usages of lpm, ipm and pm. This patch cleans up lpm related symbols
    and sysfs show/store functions as follows.

    * lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and
    MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and
    ATA_LPM_{MIN|MAX|MED}_POWER.

    * Pre/postfixes are unified to lpm.

    * sysfs show/store functions for link_power_management_policy were
    curiously named get/put and unnecessarily complex. Renamed to
    show/store and simplified.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • This is a scheleton for libata transport class.
    All information is read only, exporting information from libata:
    - ata_port class: one per ATA port
    - ata_link class: one per ATA port or 15 for SATA Port Multiplier
    - ata_device class: up to 2 for PATA link, usually one for SATA.

    Signed-off-by: Gwendal Grignou
    Reviewed-by: Grant Grundler
    Signed-off-by: Jeff Garzik

    Gwendal Grignou
     

10 Sep, 2010

1 commit

  • For some mysterious reason, certain hardware reacts badly to usual EH
    actions while the system is going for suspend. As the devices won't
    be needed until the system is resumed, ask EH to skip usual autopsy
    and recovery and proceed directly to suspend.

    Signed-off-by: Tejun Heo
    Tested-by: Stephan Diestelhorst
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

08 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
    workqueue: mark init_workqueues() as early_initcall()
    workqueue: explain for_each_*cwq_cpu() iterators
    fscache: fix build on !CONFIG_SYSCTL
    slow-work: kill it
    gfs2: use workqueue instead of slow-work
    drm: use workqueue instead of slow-work
    cifs: use workqueue instead of slow-work
    fscache: drop references to slow-work
    fscache: convert operation to use workqueue instead of slow-work
    fscache: convert object to use workqueue instead of slow-work
    workqueue: fix how cpu number is stored in work->data
    workqueue: fix mayday_mask handling on UP
    workqueue: fix build problem on !CONFIG_SMP
    workqueue: fix locking in retry path of maybe_create_worker()
    async: use workqueue for worker pool
    workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
    workqueue: implement unbound workqueue
    workqueue: prepare for WQ_UNBOUND implementation
    libata: take advantage of cmwq and remove concurrency limitations
    workqueue: fix worker management invocation without pending works
    ...

    Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
    include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c

    Linus Torvalds
     

02 Aug, 2010

1 commit


02 Jul, 2010

1 commit

  • libata has two concurrency related limitations.

    a. ata_wq which is used for polling PIO has single thread per CPU. If
    there are multiple devices doing polling PIO on the same CPU, they
    can't be executed simultaneously.

    b. ata_aux_wq which is used for SCSI probing has single thread. In
    cases where SCSI probing is stalled for extended period of time
    which is possible for ATAPI devices, this will stall all probing.

    #a is solved by increasing maximum concurrency of ata_wq. Please note
    that polling PIO might be used under allocation path and thus needs to
    be served by a separate wq with a rescuer.

    #b is solved by using the default wq instead and achieving exclusion
    via per-port mutex.

    Signed-off-by: Tejun Heo
    Acked-by: Jeff Garzik

    Tejun Heo
     

20 May, 2010

2 commits

  • Some of error handling logic in ata_sff_error_handler() and all of
    ata_sff_post_internal_cmd() are for BMDMA. Create
    ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move
    BMDMA part into those.

    While at it, change DMA protocol check to ata_is_dma(), fix
    post_internal_cmd to call ap->ops->bmdma_stop instead of directly
    calling ata_bmdma_stop() and open code hardreset selection so that
    ata_std_error_handler() doesn't have to know about sff hardreset.

    As these two functions are BMDMA specific, there's no reason to check
    for bmdma_addr before calling bmdma methods if the protocol of the
    failed command is DMA. sata_mv and pata_mpc52xx now don't need to set
    .post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't
    need to set it to their bmdma_stop routines.

    ata_sff_post_internal_cmd() becomes noop and is removed.

    This fixes p3 described in clean-up-BMDMA-initialization patch.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • port_task is tightly bound to the standard SFF PIO HSM implementation.
    Using it for any other purpose would be error-prone and there's no
    such user and if some drivers need such feature, it would be much
    better off using its own. Move it inside CONFIG_ATA_SFF and rename it
    to sff_pio_task.

    The only function which is exposed to the core layer is
    ata_sff_flush_pio_task() which is renamed from ata_port_flush_task()
    and now also takes care of resetting hsm_task_state to HSM_ST_IDLE,
    which is possible as it's now specific to PIO HSM.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

23 Apr, 2010

2 commits


21 Jan, 2010

1 commit

  • libata currently doesn't retry if a command fails with AC_ERR_INVALID
    assuming that retrying won't get it any further even if retried.
    However, a failure may be classified as invalid through hardware
    glitch (incorrect reading of the error register or firmware bug) and
    there isn't whole lot to gain by not retrying as actually invalid
    commands will be failed immediately. Also, commands serving FS IOs
    are extremely unlikely to be invalid. Retry FS IOs even if it's
    marked invalid.

    Transient and incorrect invalid failure was seen while debugging
    firmware related issue on Samsung n130 on bko#14314.

    http://bugzilla.kernel.org/show_bug.cgi?id=14314

    Signed-off-by: Tejun Heo
    Reported-by: Johannes Stezenbach
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

03 Dec, 2009

1 commit

  • If ATA device failed FLUSH, it means that the device failed to write
    out some amount of data and the error needs to be reported to upper
    layers. As retries can't recover the lost data, FLUSH failures need to
    be reported immediately in general.

    However, if FLUSH fails due to transmission errors, the FLUSH needs to
    be retried; otherwise, filesystems may switch to RO mode and/or raid
    array may drop a drive for a random transmission glitch.

    This condition can be rather easily reproduced on certain ahci
    controllers which go through a PHY event after powersave mode switch +
    ext4 combination. Powersave mode switch is often closely followed by
    flush from the filesystem failing the FLUSH with ATA bus error which
    makes the filesystem code believe that data is lost and drop to RO
    mode. This was reported in the following bugzilla bug.

    http://bugzilla.kernel.org/show_bug.cgi?id=14543

    This patch makes libata EH retry FLUSH if it wasn't failed by the
    device.

    Signed-off-by: Tejun Heo
    Reported-by: Andrey Vihrov
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

16 Oct, 2009

1 commit

  • Commit 842faa6c1a1d6faddf3377948e5cf214812c6c90 fixed error handling
    during attach by not committing detected device class to dev->class
    while attaching a new device. However, this change missed the PMP
    class check in the configuration loop causing a new PMP device to go
    through ata_dev_configure() as if it were an ATA or ATAPI device.

    As PMP device doesn't have a regular IDENTIFY data, this makes
    ata_dev_configure() tries to configure a PMP device using an invalid
    data. For the most part, it wasn't too harmful and went unnoticed but
    this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
    sata_pmp_attach(). This means that SATA_PMP_FEAT_NOTIFY ends up being
    disabled on PMPs and on PMPs which honor the flag breaks hotplug
    support.

    This problem was discovered and reported by Ethan Hsiao.

    Signed-off-by: Tejun Heo
    Reported-by: Ethan Hsiao
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

07 Oct, 2009

1 commit

  • While trying to work around spurious detection retries for
    non-existent devices on slave links, commit
    816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
    offline check logic before ata_eh_thaw() was called. This means that
    if an occupied link goes down briefly at the time that offline check
    was performed, device class will be cleared to ATA_DEV_NONE and libata
    wouldn't retry thus failing detection of the device.

    The offline check should be done after the port is thawed together
    with online check so that such link glitches can be detected by the
    interrupt handler and handled properly.

    Signed-off-by: Tejun Heo
    Reported-by: Tim Blechmann
    Cc: stable@kernel.org
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

02 Sep, 2009

3 commits

  • This patch improve libata's output for error/notification messages
    to allow easier comprehension and debugging:

    When ATAPI commands issued through the SCSI layer fail, use SCSI
    functions to print the CDB in human-readable form instead of just
    dumping out the CDB in hex.

    Print out the name of the failed command (as defined by the ATA
    specification) in error handling output along with the raw register
    contents.

    When reporting status of ACPI taskfile commands executed on resume,
    also output the names of the commands being executed (or not) in
    readable form.

    Since the extra data for printing command names increases kernel
    size slightly, a config option has been added to allow disabling
    command name output (as well as some of the error register parsing)
    for those highly sensitive to kernel text size.

    Signed-off-by: Robert Hancock
    Signed-off-by: Jeff Garzik

    Robert Hancock
     
  • Resets are done with port frozen but some controllers still issue
    interrupts during reset and they may end up recording error conditions
    in ehi leading to unnecessary EH retrials.

    This patch makes ata_eh_reset() clear ehi on reset completion. As
    reset is the most severe recovery action, there's nothing to lose by
    clearing ehi on its completion.

    Signed-off-by: Tejun Heo
    Reported-by: Zdenek Kaspar
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Call the ->freeze() hook before aborting qc's, because some hardware
    requires special handling prior to accessing the taskfile registers
    (for diagnosis/analysis/reset). Most notably, hardware may wish to
    disable the DMA engine or interrupts in the ->freeze() hook.

    Signed-off-by: Jeff Garzik

    Jeff Garzik
     

29 Jul, 2009

1 commit

  • drivers/ata/libata-eh.c +2403 ata_eh_reset(80) warning: variable derefenced before check 'slave'

    Please note that this is _not_ a real bug at the moment since ata_eh_context
    structure is embedded into ata_list structure and the code alwas checks for
    'slave' before accessing 'sehc'.

    Anyway lets add missing check and always have a valid 'sehc' pointer (which
    makes code easier to understand and prevents introducing some possible bugs
    in the future).

    Reported-by: Dan Carpenter
    Cc: corbet@lwn.net
    Cc: eteo@redhat.com
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jeff Garzik

    Bartlomiej Zolnierkiewicz
     

15 Jul, 2009

1 commit

  • ata_eh_reset() was missing error return handling after follow-up SRST
    allowing EH to continue the normal probing path after reset failure.
    This was discovered while testing new WD 2TB drives which take longer
    than 10 secs to spin up and cause the first follow-up SRST to time
    out.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     

13 Jun, 2009

1 commit


12 May, 2009

2 commits

  • Error timestamps are in jiffies which doesn't run while suspended and
    PHY events during resume isn't too uncommon. When the two are
    combined, it can lead to unnecessary speed downs if the machine is
    suspended and resumed repeatedly. Clear error history on resume.

    This was reported and verified in bnc#486803 by Vladimir Botka.

    Signed-off-by: Tejun Heo
    Reported-by: Vladimir Botka
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • New device attach path in ata_eh_revalidate_and_attach() is divided
    into two separate loops because ATA requires IDENTIFY to be issued to
    slave first while the user expects to see device probe messages from
    the master device. new_mask is used to track which devices are the
    new ones between the first loop and the second.

    This usually works well but if an error occurs during configuration
    stage, ata_dev_revalidate_and_attach() returns with error code and
    forgets new_mask. On the retry run, dev->class is set and new_mask
    for the device is clear, so the device just gets revalidated and thus
    ends up skipping post-configuration procedure including scheduling of
    SCSI_HOTPLUG for the device. When this occurs, ATA part of probing
    works fine but SCSI probing usually doesn't happen and makes the
    device unreachable.

    The behavior has been around for a very long time but it has been
    uncovered with the recent addition of 1_5_GBPS horkage which uses
    -EAGAIN return value from ata_dev_configure() to restart the probing
    sequence after forcing cable speed.

    This can be fixed by making sure dev->class is permanently set only
    after all configurations are successfully complete. Fix it.

    Signed-off-by: Tejun Heo
    Reported-by: Tim Connors
    Signed-off-by: Jeff Garzik

    Tejun Heo