23 Feb, 2017

1 commit

  • errata:
    When a read command returns less data than specified in the PRDs (for
    example, there are two PRDs for this command, but the device returns a
    number of bytes which is less than in the first PRD), the second PRD of
    this command is not read out of the PRD FIFO, causing the next command
    to use this PRD erroneously.

    workaround
    - forces sg_tablesize = 1
    - modified the sg_io function in block/scsi_ioctl.c to use a 64k buffer
    allocated with dma_alloc_coherent during the probe in ahci_imx
    - In order to fix the scsi/sata hang, when CD_ROM and HDD are
    accessed simultaneously after the workaround is applied.
    Do not go to sleep in scsi_eh_handler, when there is host failed.

    Signed-off-by: Richard Zhu

    Richard Zhu
     

15 Feb, 2017

4 commits

  • commit 2780f3c8f0233de90b6b47a23fc422b7780c5436 upstream.

    Avoid that issuing a LIP as follows:

    find /sys -name 'issue_lip'|while read f; do echo 1 > $f; done

    triggers the following:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    Call Trace:
    qla2x00_abort_all_cmds+0xed/0x140 [qla2xxx]
    qla2x00_abort_isp_cleanup+0x1e3/0x280 [qla2xxx]
    qla2x00_abort_isp+0xef/0x690 [qla2xxx]
    qla2x00_do_dpc+0x36c/0x880 [qla2xxx]
    kthread+0x10c/0x140

    [mkp: consolidated Mauricio's and Bart's fixes]

    Signed-off-by: Mauricio Faria de Oliveira
    Reported-by: Bart Van Assche
    Fixes: 1535aa75a3d8 ("qla2xxx: fix invalid DMA access after command aborts in PCI device remove")
    Cc: Himanshu Madhani
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Mauricio Faria de Oliveira
     
  • commit ffdadd68af5a397b8a52289ab39d62e1acb39e63 upstream.

    MPI2 controllers sometimes got lost (i.e. disappear from
    /sys/bus/pci/devices) if ASMP is enabled.

    Signed-off-by: Slava Kardakov
    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=60644
    Acked-by: Sreekanth Reddy
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    ojab
     
  • commit 8af8e1c22f9994bb1849c01d66c24fe23f9bc9a0 upstream.

    commit 78cbccd3bd68 ("aacraid: Fix for KDUMP driver hang")

    caused a problem on older controllers which do not support MSI-x (namely
    ASR3405,ASR3805). This patch conditionalizes the previous patch to
    controllers which support MSI-x

    Fixes: 78cbccd3bd68 ("aacraid: Fix for KDUMP driver hang")
    Reported-by: Arkadiusz Miskiewicz
    Signed-off-by: Dave Carroll
    Reviewed-by: Raghava Aditya Renukunta
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Dave Carroll
     
  • commit b22bc27868e8c11fe3f00937a341b44f80b50364 upstream.

    This patch adds internal LIO sgl limit since the driver already
    sets a max transfer limit on transport layer of 1MB to the client.

    Tested-by: Steven Royer
    Signed-off-by: Bryant G. Ly
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Bryant G. Ly
     

26 Jan, 2017

5 commits

  • commit ffb58456589443ca572221fabbdef3db8483a779 upstream.

    mpt3sas has a firmware failure where it can only handle one pass through
    ATA command at a time. If another comes in, contrary to the SAT
    standard, it will hang until the first one completes (causing long
    commands like secure erase to timeout). The original fix was to block
    the device when an ATA command came in, but this caused a regression
    with

    commit 669f044170d8933c3d66d231b69ea97cb8447338
    Author: Bart Van Assche
    Date: Tue Nov 22 16:17:13 2016 -0800

    scsi: srp_transport: Move queuecommand() wait code to SCSI core

    So fix the original fix of the secure erase timeout by properly
    returning SAM_STAT_BUSY like the SAT recommends. The original patch
    also had a concurrency problem since scsih_qcmd is lockless at that
    point (this is fixed by using atomic bitops to set and test the flag).

    [mkp: addressed feedback wrt. test_bit and fixed whitespace]

    Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
    Signed-off-by: James Bottomley
    Acked-by: Sreekanth Reddy
    Reviewed-by: Christoph Hellwig
    Reported-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    James Bottomley
     
  • commit 9373eba6cfae48911b977d14323032cd5d161aae upstream.

    The call to scsi_is_sas_rphy() needs to be made on the SAS end_device,
    not on the SCSI device.

    Fixes: 835831c57e9b ("ses: use scsi_is_sas_rphy instead of is_sas_attached")
    Signed-off-by: Ewan D. Milne
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: James Bottomley
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Ewan D. Milne
     
  • commit 387b978cb0d12cf3720ecb17e652e0a9991a08e2 upstream.

    Current code incorrectly calculates the max transfer length, since
    it is assuming a 4k page table, but ppc64 all run on 64k page tables.

    Reported-by: Steven Royer
    Tested-by: Steven Royer
    Signed-off-by: Bryant G. Ly
    Signed-off-by: Bart Van Assche
    Signed-off-by: Greg Kroah-Hartman

    Bryant G. Ly
     
  • commit a5b0e4062fb225155189e593699bbfcd0597f8b5 upstream.

    Currently, dma_alloc_coherent is being called with a GFP_KERNEL
    flag which allows it to sleep in an interrupt context, need to
    change to GFP_ATOMIC.

    Tested-by: Steven Royer
    Reviewed-by: Michael Cyr
    Signed-off-by: Bryant G. Ly
    Signed-off-by: Bart Van Assche
    Signed-off-by: Greg Kroah-Hartman

    Bryant G. Ly
     
  • commit fc1ffd6cb38a1c1af625b9833c41928039e733f5 upstream.

    During code inspection, while investigating following stack trace
    seen on one of the test setup, we found out there was possibility
    of memory leak becuase driver was not unwinding the stack properly.

    This issue has not been reproduced in a test environment or on a
    customer setup.

    Here's stack trace that was seen.

    [1469877.797315] Call Trace:
    [1469877.799940] [] qla2x00_mem_alloc+0xb09/0x10c0 [qla2xxx]
    [1469877.806980] [] qla2x00_probe_one+0x86a/0x1b50 [qla2xxx]
    [1469877.814013] [] ? __pm_runtime_resume+0x51/0xa0
    [1469877.820265] [] ? _raw_spin_lock_irqsave+0x25/0x90
    [1469877.826776] [] ? _raw_spin_unlock_irqrestore+0x6d/0x80
    [1469877.833720] [] ? preempt_count_sub+0xb1/0x100
    [1469877.839885] [] ? _raw_spin_unlock_irqrestore+0x4c/0x80
    [1469877.846830] [] local_pci_probe+0x4c/0xb0
    [1469877.852562] [] ? preempt_count_sub+0xb1/0x100
    [1469877.858727] [] pci_call_probe+0x89/0xb0

    Signed-off-by: Quinn Tran
    Signed-off-by: Himanshu Madhani
    Reviewed-by: Christoph Hellwig
    [ bvanassche: Fixed spelling in patch description ]
    Signed-off-by: Bart Van Assche
    Signed-off-by: Greg Kroah-Hartman

    Quinn Tran
     

20 Jan, 2017

1 commit

  • commit 7c9d8d0c41b3e24473ac7648a7fc2d644ccf08ff upstream.

    If srp_transfer_data fails within ibmvscsis_write_pending, then
    the most likely scenario is that the client timed out the op and
    removed the TCE mapping. Thus it will loop forever retrying the
    op that is pretty much guaranteed to fail forever. A better return
    code would be EIO instead of EAGAIN.

    Reported-by: Steven Royer
    Tested-by: Steven Royer
    Signed-off-by: Bryant G. Ly
    Signed-off-by: Bart Van Assche
    Signed-off-by: Greg Kroah-Hartman

    Bryant G. Ly
     

12 Jan, 2017

2 commits

  • commit af15769ffab13d777e55fdef09d0762bf0c249c4 upstream.

    gcc-7 notices that the condition in mvs_94xx_command_active looks
    suspicious:

    drivers/scsi/mvsas/mv_94xx.c: In function 'mvs_94xx_command_active':
    drivers/scsi/mvsas/mv_94xx.c:671:15: error: '<
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • commit 7b93ca43b7e21fbe6fb1a6f4ecce4a2f70f424a0 upstream.

    When a SW-configurable card is specified but not found, the driver
    releases wrong region, causing the following message in kernel log:
    Trying to free nonexistent resource

    Fix it by assigning base earlier.

    Signed-off-by: Ondrej Zary
    Fixes: a8cfbcaec0c1 ("scsi: g_NCR5380: Stop using scsi_module.c")
    Signed-off-by: Finn Thain
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Ondrej Zary
     

09 Jan, 2017

5 commits

  • commit 128394eff343fc6d2f32172f03e24829539c5835 upstream.

    Both damn things interpret userland pointers embedded into the payload;
    worse, they are actually traversing those. Leaving aside the bad
    API design, this is very much _not_ safe to call with KERNEL_DS.
    Bail out early if that happens.

    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit ae2aae2421983f6f68eb7c4692624bc43ea50712 upstream.

    Controllers with this PCI ID never shipped outside of
    PMCS/Microsemi. Remove the ID from the aacraid driver. smartpqi is the
    correct driver for these controllers.

    [mkp: patch description]

    Reviewed-by: Scott Teel
    Signed-off-by: Kevin Barnett
    Signed-off-by: Don Brace
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Kevin Barnett
     
  • commit d2a145252c52792bc59e4767b486b26c430af4bb upstream.

    A race between scanning and fc_remote_port_delete() may result in a
    permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
    and unblocked after. The reason is that blocking a device sets both the
    SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED. However,
    scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
    device to be ignored by scsi_target_unblock() and thus never have its
    QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
    running but has a stopped queue.

    We actually have two places where SDEV_RUNNING is set: once in
    scsi_add_lun() which respects the blocked flag and once in
    scsi_sysfs_add_sdev() which doesn't. Since the second set is entirely
    spurious, simply remove it to fix the problem.

    Reported-by: Zengxi Chen
    Signed-off-by: Wei Fang
    Reviewed-by: Ewan D. Milne
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Wei Fang
     
  • … not support JBOD sequence map

    commit d5573584429254a14708cf8375c47092b5edaf2c upstream.

    Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Reviewed-by: Tomas Henzl <thenzl@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

    Kashyap Desai
     
  • commit 18e1c7f68a5814442abad849abe6eacbf02ffd7c upstream.

    For SRIOV enabled firmware, if there is a OCR(online controller reset)
    possibility driver set the convert flag to 1, which is not happening if
    there are outstanding commands even after 180 seconds. As driver does
    not set convert flag to 1 and still making the OCR to run, VF(Virtual
    function) driver is directly writing on to the register instead of
    waiting for 30 seconds. Setting convert flag to 1 will cause VF driver
    will wait for 30 secs before going for reset.

    Signed-off-by: Kiran Kumar Kasturi
    Signed-off-by: Sumit Saxena
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Tomas Henzl
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Kashyap Desai
     

09 Dec, 2016

1 commit


30 Nov, 2016

2 commits

  • Pull SCSI fixes from James Bottomley:
    "Four small fixes.

    The be2iscsi is a potential device overrun in consistent memory, which
    could have nasty consequences if the consistent allocations are
    packed.

    The hpsa one fixes a regression where older controllers can now get a
    numbering clash between the first internal disk and the controller.

    The libfc one is a regression in timespec conversions which causes a
    user visible issue in a command line tool and the mpt3sas one fixes a
    regression where the controller could remain permanently blocked after
    an ATA pass through command followed by a reset"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: be2iscsi: allocate enough memory in beiscsi_boot_get_sinfo()
    scsi: mpt3sas: Unblock device after controller reset
    scsi: hpsa: use bus '3' for legacy HBA devices
    scsi: libfc: fix seconds_since_last_reset miscalculation

    Linus Torvalds
     
  • James Bottomley
     

29 Nov, 2016

3 commits

  • Pull libata fixes from Tejun Heo:
    "The recent changes in ahci MSI handling need one more fix. Hopefully,
    this restores parity with before.

    The other two are minor fixes with both low impact and risk"

    * 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    ahci: always fall back to single-MSI mode
    libata-scsi: Fixup ata_gen_passthru_sense()
    mvsas: fix error return code in mvs_task_prep()

    Linus Torvalds
     
  • Pull sparc fixes from David Miller:
    "Two ugly build warning fixes"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    dbri: Fix compiler warning
    qlogicpti: Fix compiler warnings

    Linus Torvalds
     
  • qlogicpti uses '__u32' for dma handle while invoking kernel DMA APIs,
    instead of using dma_addr_t. This hasn't caused any 'incompatible
    pointer type' warning on SPARC because until now dma_addr_t is of
    type u32. However, recent changes in SPARC ATU (iommu) enabled 64bit
    DMA and therefore dma_addr_t became of type u64. This makes
    'incompatible pointer type' warnings inevitable.

    e.g.
    drivers/scsi/qlogicpti.c: In function ‘qpti_map_queues’:
    drivers/scsi/qlogicpti.c:813: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
    ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’
    drivers/scsi/qlogicpti.c:822: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
    ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’

    For the record, qlogicpti never executes on sun4v. Therefore even
    though 64bit DMA is enabled on SPARC, qlogicpti continues to use
    legacy iommu that guarantees DMA address is always in 32bit range.

    This patch resolves aforementioned compiler warnings.

    Signed-off-by: Tushar Dave
    Reviewed-by: thomas tai
    Signed-off-by: David S. Miller

    Tushar Dave
     

25 Nov, 2016

1 commit

  • The BUG_ON() recently introduced in lpfc_sli_ringtxcmpl_put() is hit in
    the lpfc_els_abort() > lpfc_sli_issue_abort_iotag() >
    lpfc_sli_abort_iotag_issue() function path [similar names], due to
    'piocb->vport == NULL':

    BUG_ON(!piocb || !piocb->vport);

    This happens because lpfc_sli_abort_iotag_issue() doesn't set the
    'abtsiocbp->vport' pointer -- but this is not the problem.

    Previously, lpfc_sli_ringtxcmpl_put() accessed 'piocb->vport' only if
    'piocb->iocb.ulpCommand' is neither CMD_ABORT_XRI_CN nor
    CMD_CLOSE_XRI_CN, which are the only possible values for
    lpfc_sli_abort_iotag_issue():

    lpfc_sli_ringtxcmpl_put():

    if ((unlikely(pring->ringno == LPFC_ELS_RING)) &&
    (piocb->iocb.ulpCommand != CMD_ABORT_XRI_CN) &&
    (piocb->iocb.ulpCommand != CMD_CLOSE_XRI_CN) &&
    (!(piocb->vport->load_flag & FC_UNLOADING)))

    lpfc_sli_abort_iotag_issue():

    if (phba->link_state >= LPFC_LINK_UP)
    iabt->ulpCommand = CMD_ABORT_XRI_CN;
    else
    iabt->ulpCommand = CMD_CLOSE_XRI_CN;

    So, this function path would not have hit this possible NULL pointer
    dereference before.

    In order to fix this regression, move the second part of the BUG_ON()
    check prior to the pointer dereference that it does check for.

    For reference, this is the stack trace observed. The problem happened
    because an unsolicited event was received - a PLOGI was received after
    our PLOGI was issued but not yet complete, so the discovery state
    machine goes on to sw-abort our PLOGI.

    kernel BUG at drivers/scsi/lpfc/lpfc_sli.c:1326!
    Oops: Exception in kernel mode, sig: 5 [#1]

    NIP [...] lpfc_sli_ringtxcmpl_put+0x1c/0xf0 [lpfc]
    LR [...] __lpfc_sli_issue_iocb_s4+0x188/0x200 [lpfc]
    Call Trace:
    [...] [...] __lpfc_sli_issue_iocb_s4+0xb0/0x200 [lpfc] (unreliable)
    [...] [...] lpfc_sli_issue_abort_iotag+0x2b4/0x350 [lpfc]
    [...] [...] lpfc_els_abort+0x1a8/0x4a0 [lpfc]
    [...] [...] lpfc_rcv_plogi+0x6d4/0x700 [lpfc]
    [...] [...] lpfc_rcv_plogi_plogi_issue+0xd8/0x1d0 [lpfc]
    [...] [...] lpfc_disc_state_machine+0xc0/0x2b0 [lpfc]
    [...] [...] lpfc_els_unsol_buffer+0xcc0/0x26c0 [lpfc]
    [...] [...] lpfc_els_unsol_event+0xa8/0x220 [lpfc]
    [...] [...] lpfc_complete_unsol_iocb+0xb8/0x138 [lpfc]
    [...] [...] lpfc_sli4_handle_received_buffer+0x6a0/0xec0 [lpfc]
    [...] [...] lpfc_sli_handle_slow_ring_event_s4+0x1c4/0x240 [lpfc]
    [...] [...] lpfc_sli_handle_slow_ring_event+0x24/0x40 [lpfc]
    [...] [...] lpfc_do_work+0xd88/0x1970 [lpfc]
    [...] [...] kthread+0x108/0x130
    [...] [...] ret_from_kernel_thread+0x5c/0xbc

    Cc: stable@vger.kernel.org # v4.8
    Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
    Reported-by: Harsha Thyagaraja
    Signed-off-by: Mauricio Faria de Oliveira
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Martin K. Petersen

    Mauricio Faria de Oliveira
     

23 Nov, 2016

5 commits

  • James Bottomley
     
  • We accidentally allocate sizeof(u32) instead of sizeof(struct
    be_cmd_get_session_resp).

    Fixes: 50a4b824be9e ("scsi: be2iscsi: Fix to make boot discovery non-blocking")
    Signed-off-by: Dan Carpenter
    Reviewed by: Jitendra Bhivare
    Signed-off-by: Martin K. Petersen

    Dan Carpenter
     
  • While issuing any ATA passthrough command to firmware the driver will
    block the device. But it will unblock the device only if the I/O
    completes through the ISR path. If a controller reset occurs before
    command completion the device will remain in blocked state.

    Make sure we unblock the device following a controller reset if an ATA
    passthrough command was queued.

    [mkp: clarified patch description]

    Cc: # v4.4+
    Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset")
    Signed-off-by: Suganath Prabu S
    Signed-off-by: Martin K. Petersen

    Suganath Prabu S
     
  • Older controllers use SCSI target id '0' for the first internal disk. As
    the controllers are now placed on the same bus as the internal disks
    this leads to a clash with the SCSI target id of controller. This patch
    checks the SCSI revision, and moves older controller to bus '3' to be
    compatible with older releases and avoid this problem.

    [mkp: fixed uninitialized variable]

    Fixes: 09371d623c9 ("hpsa: Change SAS transport devices to bus 0.")
    Cc: # v4.5+
    Signed-off-by: Hannes Reinecke
    Acked-by: Don Brace
    Signed-off-by: Martin K. Petersen

    Hannes Reinecke
     
  • Pull SCSI fixes from James Bottomley:
    "Two small fixes.

    One prevents timeouts on mpt3sas when trying to use the secure erase
    protocol which causes the erase protocol to be aborted. The second is
    a regression in a prior fix which causes all commands to abort during
    PCI extended error recovery, which is incorrect because PCI EEH is
    independent from what's happening on the FC transport"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery
    scsi: mpt3sas: Fix secure erase premature termination

    Linus Torvalds
     

18 Nov, 2016

1 commit

  • Commit 540eb1eef0ab ("scsi: libfc: fix seconds_since_last_reset calculation")
    removed the use of 'struct timespec' from fc_get_host_stats(). This broke the
    output of 'fcoeadm -s' after kernel 4.8-rc1.

    Signed-off-by: Johannes Thumshirn
    Cc: # v4.8+
    Fixes: 540eb1eef0ab ("scsi: libfc: fix seconds_since_last_reset calculation")
    Acked-by: Arnd Bergmann
    Reviewed-by: Bart Van Assche
    Signed-off-by: Martin K. Petersen

    Johannes Thumshirn
     

15 Nov, 2016

2 commits

  • James Bottomley
     
  • The previous commit 1535aa75a3d8 ("qla2xxx: fix invalid DMA access after
    command aborts in PCI device remove") introduced a regression during an
    EEH recovery, since the change to the qla2x00_abort_all_cmds() function
    calls qla2xxx_eh_abort(), which verifies the EEH recovery condition but
    handles it heavy-handed. (commit a465537ad1a4 "qla2xxx: Disable the
    adapter and skip error recovery in case of register disconnect.")

    This problem warrants a more general/optimistic solution right into
    qla2xxx_eh_abort() (eg in case a real command abort arrives during EEH
    recovery, or if it takes long enough to trigger command aborts); but
    it's still worth to add a check to ensure the code added by the previous
    commit is correct and contained within its owner function.

    This commit just adds a 'if (!ha->flags.eeh_busy)' check around it.
    (ahem; a trivial fix for this -rc series; sorry for this oversight.)

    With it applied, both PCI device remove and EEH recovery works fine.

    Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove")
    Signed-off-by: Mauricio Faria de Oliveira
    Acked-by: Himanshu Madhani
    Signed-off-by: Martin K. Petersen

    Mauricio Faria de Oliveira
     

14 Nov, 2016

1 commit

  • Pull SCSI fixes from James Bottomley:
    "The megaraid_sas patch in here fixes a major regression in the last
    fix set that made all megaraid_sas cards unusable. It turns out no-one
    had actually tested such an "obvious" fix, sigh. The fix for the fix
    has been tested ...

    The next most serious is the vmw_pvscsi abort problem which basically
    means that aborts don't work on the vmware paravirt devices and error
    handling always escalates to reset.

    The rest are an assortment of missed reference counting in certain
    paths and corner case bugs that show up on some architectures"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
    scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove
    scsi: qla2xxx: do not queue commands when unloading
    scsi: libcxgbi: fix incorrect DDP resource cleanup
    scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
    scsi: scsi_dh_alua: Fix a reference counting bug
    scsi: vmw_pvscsi: return SUCCESS for successful command aborts
    scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
    scsi: scsi_dh_alua: fix missing kref_put() in alua_rtpg_work()

    Linus Torvalds
     

12 Nov, 2016

1 commit

  • This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
    secure erase. Due to the very long time the operation takes, commands
    issued during the erase will time out and will trigger execution of the
    abort hook. Even though the abort hook is called for the specific
    command which timed out, this leads to entire device halt
    (scsi_state terminated) and premature termination of the secure erase.

    Set device state to busy while ATA passthrough commands are in progress.

    [mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

    Signed-off-by: Andrey Grodzovsky
    Acked-by: Sreekanth Reddy
    Cc:
    Cc: Sathya Prakash
    Cc: Chaitra P B
    Cc: Suganath Prabu Subramani
    Cc: Sreekanth Reddy
    Cc: Hannes Reinecke
    Cc:
    Signed-off-by: Martin K. Petersen

    Andrey Grodzovsky
     

11 Nov, 2016

1 commit


10 Nov, 2016

1 commit

  • This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
    megaraid_sas: Fix data integrity failure for JBOD (passthrough)
    devices").

    The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
    and as a result the driver ended up exposing a lot of non-existing SCSI
    devices (all SCSI commands to channels 1,2,3 were returned as
    SUCCESS-DID_OK by driver).

    [mkp: clarified patch description]

    Fixes: 1e793f6fc0db920400574211c48f9157a37e3945
    Reported-by: Jens Axboe
    CC: stable@vger.kernel.org
    Signed-off-by: Kashyap Desai
    Signed-off-by: Sumit Saxena
    Tested-by: Sumit Saxena
    Reviewed-by: Tomas Henzl
    Tested-by: Jens Axboe
    Signed-off-by: Martin K. Petersen

    Sumit Saxena
     

09 Nov, 2016

3 commits

  • If a command is aborted in the kernel but not in the adapter, it might be
    considered complete and its DMA memory released, but it is still alive in
    the adapter, which will trigger an invalid DMA access upon its completion
    (in the DMA operations to deliver the command response to the driver).

    On powerpc platforms with IOMMU/EEH capabilities, the problem is observed
    during PCI device removal with ongoing IO requests -- which might trigger
    an EEH event very often, pointing to a 'TCE Request Page Access Error'.

    In that path, which is qla2x00_remove_one(), the commands are aborted in
    qla2x00_abort_all_cmds(), which does not perform an abort in the adapter
    as is done in qla2xxx_eh_abort() for example.

    So, this patch changes qla2x00_abort_all_cmds() to abort commands in the
    adapter too, with a call to qla2xxx_eh_abort(), which already implements
    all the logic to submit abort requests and handle responses.

    Reported-by: Naresh Bannoth
    Signed-off-by: Mauricio Faria de Oliveira
    Acked-by: Himanshu Madhani
    Signed-off-by: Martin K. Petersen

    Mauricio Faria de Oliveira
     
  • When the driver is unloading, in qla2x00_remove_one(), there is a single
    call/point in time to abort ongoing commands, qla2x00_abort_all_cmds(),
    which is still several steps away from the call to scsi_remove_host().

    If more commands continue to arrive and be processed during that
    interval, when the driver is tearing down and releasing its structures,
    it might potentially hit an oops due to invalid memory access:

    Unable to handle kernel paging request for data at address 0x00000138

    NIP [d000000004700a40] qla2xxx_queuecommand+0x80/0x3f0 [qla2xxx]
    LR [d000000004700a10] qla2xxx_queuecommand+0x50/0x3f0 [qla2xxx]

    So, fail commands in qla2xxx_queuecommand() if the UNLOADING bit is set.

    Signed-off-by: Mauricio Faria de Oliveira
    Acked-by: Himanshu Madhani
    Signed-off-by: Martin K. Petersen

    Mauricio Faria de Oliveira
     
  • Before calling task_release_itt() task data is memset to zero because of
    which DDP context information is lost resulting in incorrect DDP
    resource cleanup, to fix this call task_release_itt() before memset.

    Signed-off-by: Varun Prakash
    Signed-off-by: Martin K. Petersen

    Varun Prakash