10 May, 2016

1 commit


06 May, 2016

2 commits

  • I actually read the error messages in my logs, and successful
    initialization is not an error.

    Arguably these log lines could be deleted entirely.

    Signed-off-by: Andy Lutomirski
    Reviewed-by: Hannes Reinicke
    Acked-by: Sumit Saxena
    Signed-off-by: Martin K. Petersen

    Andy Lutomirski
     
  • When a cxlflash adapter goes into EEH recovery and multiple processes
    (each having established its own context) are active, the EEH recovery
    can hang if the processes attempt to recover in parallel. The symptom
    logged after a couple of minutes is:

    INFO: task eehd:48 blocked for more than 120 seconds.
    Not tainted 4.5.0-491-26f710d+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    eehd 0 48 2
    Call Trace:
    __switch_to+0x2f0/0x410
    __schedule+0x300/0x980
    schedule+0x48/0xc0
    rwsem_down_write_failed+0x294/0x410
    down_write+0x88/0xb0
    cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
    cxl_vphb_error_detected+0x88/0x110 [cxl]
    cxl_pci_error_detected+0xb0/0x1d0 [cxl]
    eeh_report_error+0xbc/0x130
    eeh_pe_dev_traverse+0x94/0x160
    eeh_handle_normal_event+0x17c/0x450
    eeh_handle_event+0x184/0x370
    eeh_event_handler+0x1c8/0x1d0
    kthread+0x110/0x130
    ret_from_kernel_thread+0x5c/0xa4
    INFO: task blockio:33215 blocked for more than 120 seconds.

    Not tainted 4.5.0-491-26f710d+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    blockio 0 33215 33213
    Call Trace:
    0x1 (unreliable)
    __switch_to+0x2f0/0x410
    __schedule+0x300/0x980
    schedule+0x48/0xc0
    rwsem_down_read_failed+0x124/0x1d0
    down_read+0x68/0x80
    cxlflash_ioctl+0x70/0x6f0 [cxlflash]
    scsi_ioctl+0x3b0/0x4c0
    sg_ioctl+0x960/0x1010
    do_vfs_ioctl+0xd8/0x8c0
    SyS_ioctl+0xd4/0xf0
    system_call+0x38/0xb4
    INFO: task eehd:48 blocked for more than 120 seconds.

    The hang is because of a 3 way dead-lock:

    Process A holds the recovery mutex, and waits for eehd to complete.
    Process B holds the semaphore and waits for the recovery mutex.
    eehd waits for semaphore.

    The fix is to have Process B above release the semaphore before
    attempting to acquire the recovery mutex. This will allow
    eehd to proceed to completion.

    Signed-off-by: Manoj N. Kumar
    Reviewed-by: Matthew R. Ochs
    Signed-off-by: Martin K. Petersen

    Manoj N. Kumar
     

04 May, 2016

2 commits

  • Based on "[PATH V2] scsi_debug: rework resp_report_luns" patch
    sent by Tomas Winkler on Thursday, 26 Feb 2015. His notes:
    1. Remove duplicated boundary checks which simplify the fill-in
    loop
    2. Use more of scsi generic API
    Replace fixed length response array a with heap allocation
    allowing up to 256 normal LUNs per target.

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Reviewed-by: Tomas Winkler
    Reviewed-by: Bart Van Assche
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • Use TYPE_* constants for SCSI peripheral device types instead of
    numbers. Further cleanups requested by checkpatch.pl.

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Reviewed-by: Bart Van Assche
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     

30 Apr, 2016

25 commits

  • The most common commands in normal use are the READ and WRITE SCSI
    commands. Use likely and unlikely hints along the path taken by these
    commands. Rename check_readiness() to make_ua() and remove associated
    dead code. Rename devInfoReg() to find_build_dev_info().

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • Group most defines together first; followed by struct definitions and
    then table and variable definitions. Normalize all function headers.

    [mkp: Corrected hex value in WP/DPOFUA MODE SENSE comment]

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • When a negative value was placed in the delay parameter, a tasklet was
    scheduled. Change the tasklet to a work queue. Previously a delay of -1
    scheduled a high priority tasklet; since there are no high priority work
    queues, treat -1 like other negative values in delay and schedule a work
    item.

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • Add 'j' to delay names to make it clearer that its unit is jiffies and
    to differentiate it from sdebug_ndelay whose unit is nanoseconds.

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • The driver supports two command delay interfaces, the original one whose
    unit is a jiffy, and a newer one whose unit is a nanosecond. Each had
    different implementations. Keep both interfaces but simplify the
    implemenation to use a single delay mechanism based on high resolution
    timers.

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • Remove logic to optionally hold host_lock while each command is
    queued. Keep module and sysfs host_lock parameters for backward
    compatibility. Note in module parameter description that host_lock is
    ignored.

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinicke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • Shorten file scope static and constant names. Use more
    get/put_unaligned calls to hide bit banging. Introduce
    sdebug_verbose boolean to replace frequent masking of
    option bit flags. Add GPL and bump version.

    [mkp: Use logical instead of bitwise OR for LBP VPD flags]

    Signed-off-by: Douglas Gilbert
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Martin K. Petersen

    Douglas Gilbert
     
  • Reviewed-by: Gerry Morong
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • Need to report HBA device removal faster than the
    event handler polling interval.

    Stop I/O to the removed disk and wait for all
    I/O operations to flush before removing the device.

    Reviewed-by: Scott Teel
    Reviewed-by: Kevin Barnett
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • set offload_to_be_enabled to 0 when an ioaccel2 error is processed.

    Before, an ioaccel completion error would turn of ioaccel but a rescan
    would turn it back on again.

    Reviewed-by: Scott Teel
    Reviewed-by: Kevin Barnett
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • offload_to_be_enabled also needs to be set to 0 during a state
    change.

    Reviewed-by: Scott Teel
    Reviewed-by: Kevin Barnett
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • faulty drives can cause the driver to hang during a
    scan operation.

    Reviewed-by: Scott Teel
    Reviewed-by: Kevin Barnett
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • There have been companies requesting a sysfs entry
    to obtain the sas address of device.

    Reviewed-by: Scott Teel
    Reviewed-by: Kevin Barnett
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Joseph T Handzik
     
  • The driver was calling scsi_scan_host before enabling interrupts.

    This has gone unnoticed except for customers running in intx mode.
    Calling scsi_scan_host before interrupts are enabled causes
    "irq XX: nobody cared" messages and the driver to hang.

    This patch enables interrupts before the call to scsi_scan_host.

    Reported-by: Piotr Karbowski
    Reviewed-by: Scott Teel
    Reviewed-by: Kevin Barnett
    Signed-off-by: Don Brace
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • When KDUMP is triggered the driver first talks to the firmware in INTX
    mode, but the adapter firmware is still in MSIX mode. Therefore the first
    driver command hangs since the driver is waiting for an INTX response and
    firmware gives a MSIX response. If when the OS is installed on a RAID
    drive created by the adapter KDUMP will hang since the driver does not
    receive a response in sync mode.

    Fixed by: Change the firmware to INTX mode if it is in MSIX mode before
    sending the first sync command.

    Cc: stable@vger.kernel.org
    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • Currently driver completes double completed or spurious interrupted fibs.
    This is not necessary and causes the SCSI mid layer to issue aborts and
    resets, since completing a fib prematurely might trigger a race condition
    resulting in the driver not calling the scsi_done callback.

    Fixed by removing the call to fib complete.

    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • Firmware AIF messages about cache loss and data recovery are being missed
    by the driver since currently they are not captured but rather let go.
    This patch to capture those messages and log them for the user.

    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • Typically under error conditions, it is possible for aac_command_thread()
    to miss the wakeup from kthread_stop() and go back to sleep, causing it
    to hang aac_shutdown.

    In the observed scenario, the adapter is not functioning correctly and so
    aac_fib_send() never completes (or time-outs depending on how it was
    called). Shortly after aac_command_thread() starts it performs
    aac_fib_send(SendHostTime) which hangs. When aac_probe_one
    /aac_get_adapter_info send time outs, kthread_stop is called which breaks
    the command thread out of it's hang.

    The code will still go back to sleep in schedule_timeout() without
    checking kthread_should_stop() so it causes aac_probe_one to hang until
    the schedule_timeout() which is 30 minutes.

    Fixed by: Adding another kthread_should_stop() before schedule_timeout()
    Cc: stable@vger.kernel.org
    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • As the firmware for series 6, 7, 8 cards does not support msi, remove it
    in the driver

    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • aac_fib_send has a special function case for initial commands during
    driver initialization using wait < 0(pseudo sync mode). In this case,
    the command does not sleep but rather spins checking for timeout.This
    loop is calls cpu_relax() in an attempt to allow other processes/threads
    to use the CPU, but this function does not relinquish the CPU and so the
    command will hog the processor. This was observed in a KDUMP
    "crashkernel" and that prevented the "command thread" (which is
    responsible for completing the command from being timed out) from
    starting because it could not get the CPU.

    Fixed by replacing "cpu_relax()" call with "schedule()"
    Cc: stable@vger.kernel.org
    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • The adapter has to be started after updating the number of MSIX Vectors

    Fixes: ecc479e00db8 (aacraid: Set correct MSIX count for EEH recovery)
    Cc: stable@vger.kernel.org
    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • Suggested-by: Seymour, Shane M
    Signed-off-by: Raghava Aditya Renukunta
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • Current driver checks for NULL return from aac_fib_alloc_tag, but it not
    possible for it to return NULL.

    Fixed by: Remove all the checks for NULL returns from aac_fib_alloc_tag

    Suggested-by: Tomas Henzl
    Signed-off-by: Raghava Aditya Renukunta
    Signed-off-by: Martin K. Petersen

    Raghava Aditya Renukunta
     
  • mptsas_smp_handler() checks for dma mapping errors by comparison
    returned address with zero, while pci_dma_mapping_error() should be
    used.

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Acked-by: Sathya Prakash Veerichetty
    Signed-off-by: Martin K. Petersen

    Alexey Khoroshilov
     

26 Apr, 2016

6 commits

  • The file atari_NCR5380.c has been removed from the tree so remove it
    from the MAINTAINERS file as well.

    While we are here, add the file dtc3x80.txt as it is only relevant to
    the dtc driver.

    Signed-off-by: Finn Thain
    Signed-off-by: Martin K. Petersen

    Finn Thain
     
  • Some drives set the ILI flag together with MEDIUM ERROR sense code.
    Clear the ILI flag in this case so that the medium error will be
    handled. The problem was reported by Maurizio Lombardi.

    Signed-off-by: Kai Mäkisara
    Reviewed-by: Laurence Oberman
    Signed-off-by: Martin K. Petersen

    Kai Makisara
     
  • Remove incorrect lockdep assertion from lpfc_sli_hbqbuf_find() which
    acquires the hbalock itself. Fix the comment which resulted in this
    mistake.

    Fixes: 1c2ba475eb0e ("lpfc: Add lockdep assertions")
    Signed-off-by: Sebastian Herbszt
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Sebastian Herbszt
     
  • Presumably it isn't possible to have empty lists here, but my static
    checker doesn't know that and complains that "ep" can be used
    uninitialized.

    Signed-off-by: Dan Carpenter
    Acked-by: Nilesh Javali
    Signed-off-by: Martin K. Petersen

    Dan Carpenter
     
  • This has only called from show_sas_rphy_enclosure_identifier(). The
    caller expects that we set an identifier, otherwise it uses an
    uninitialized variable.

    [mkp: fixed typo]

    Signed-off-by: Dan Carpenter
    Acked-by: Don Brace
    Signed-off-by: Martin K. Petersen

    Dan Carpenter
     
  • Firmware events are queued up using the fw_event_work's struct work, not
    its delayed_work member. The initial driver for SAS2 controllers had
    handled firmware reset using the rescan barrier and was later redesigned
    through "mpt2sas: [Resend] Host Reset code cleanup". The delayed_work
    variables are now unused and may provoke CONFIG_DEBUG_OBJECTS_TIMERS
    "assert_init not available" false warnings in
    _scsih_fw_event_cleanup_queue.

    Cleanup fw_event_work's unused entries, update its kerneldoc, and
    update _scsih_fw_event_cleanup_queue accordingly.

    Fixes: 146b16c8071f (mpt3sas: Refcount fw_events and fix unsafe list usage)
    Signed-off-by: Joe Lawrence
    Acked-by: Chaitra P B
    Signed-off-by: Martin K. Petersen

    Joe Lawrence
     

16 Apr, 2016

4 commits