27 Mar, 2020

34 commits

  • Kernel is crashing with the following stacktrace:

    BUG: unable to handle kernel NULL pointer dereference at
    00000000000005bc
    IP: lpfc_nvme_register_port+0x1a8/0x3a0 [lpfc]
    ...
    Call Trace:
    lpfc_nlp_state_cleanup+0x2b2/0x500 [lpfc]
    lpfc_nlp_set_state+0xd7/0x1a0 [lpfc]
    lpfc_cmpl_prli_prli_issue+0x1f7/0x450 [lpfc]
    lpfc_disc_state_machine+0x7a/0x1e0 [lpfc]
    lpfc_cmpl_els_prli+0x16f/0x1e0 [lpfc]
    lpfc_sli_sp_handle_rspiocb+0x5b2/0x690 [lpfc]
    lpfc_sli_handle_slow_ring_event_s4+0x182/0x230 [lpfc]
    lpfc_do_work+0x87f/0x1570 [lpfc]
    kthread+0x10d/0x130
    ret_from_fork+0x35/0x40

    During target side fault injections, it is possible to hit the
    NLP_WAIT_FOR_UNREG case in lpfc_nvme_remoteport_delete. A prior commit
    fixed a rebind and delete race condition, but called lpfc_nlp_put
    unconditionally. This triggered a deletion and the crash.

    Fix by movng nlp_put to inside the NLP_WAIT_FOR_UNREG case, where the nlp
    will be being unregistered/removed. Leave the reference if the flag isn't
    set.

    Link: https://lore.kernel.org/r/20200322181304.37655-8-jsmart2021@gmail.com
    Fixes: b15bd3e6212e ("scsi: lpfc: Fix nvme remoteport registration race conditions")
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • The lpfc_sli4_wq_release() routine iterates for each interim value when
    updating the wq consuemr index. This wastes cycles and possibly confuses
    things as thevalue itterates (and the modulo logic is being applied).

    There's no reason for this. Just set it to the value from the hw.

    Link: https://lore.kernel.org/r/20200322181304.37655-7-jsmart2021@gmail.com
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • Injecting EEH on a 32GB card is causing kernel oops

    The pci error handler is doing an IO flush and the offline code is also
    doing an IO flush. When the 1st flush is complete the hdwq is destroyed
    (freed), yet the second flush accesses the hdwq and crashes.

    Added a check in lpfc_sli4_fush_io_rings to check both the HBA_IOQ_FLUSH
    flag and the hdwq pointer to see if it is already set and not already
    freed.

    Link: https://lore.kernel.org/r/20200322181304.37655-6-jsmart2021@gmail.com
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • SCSI layer sends driver IOs with more s/g segments than driver can handle.
    This results in "Too many sg segments from dma_map_sg. Config 64, seg_cnt
    219" error messages from the lpfc_scsi_prep_dma_buf_s3() routine.

    The was due to use the driver using individual templates for pport and
    vport, host reset enabled or not, nvme vs scsi, etc. In the end, there was
    a combination for a vport that didn't match the pport.

    Rather than enumerating more templates and more discretionary assignments,
    revert to a base template that is copied to a template specific to the
    pport/vport. Then, based on role, attributes and sli type, modify the
    fields that are different for that port. Added a log message to
    lpfc_create_port to validate values.

    Link: https://lore.kernel.org/r/20200322181304.37655-5-jsmart2021@gmail.com
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • In lpfc_nvmet_prep_fcp_wqe() the line "rsp->sg_cnt = 0" is modifying the
    transport's data structure. This may result in the transport believing the
    s/g list was already freed, thus may not unmap/free it properly. Lpfc
    driver should not modify the transport data structure.

    The zeroing of the sg_cnt is to avoid use of the transport's sgl in a
    subsequent loop where the driver builds the necessary requests for the
    adapter firmware to complete the IO.

    Change LLDD to use a local copy of the transport sg_cnt when building
    requests to be passed to the adapter fw.

    Link: https://lore.kernel.org/r/20200322181304.37655-4-jsmart2021@gmail.com
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • The following lockdep error was reported when unloading the lpfc driver:

    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.
    ...
    Call Trace:
    dump_stack+0x96/0xe0
    register_lock_class+0x8b8/0x8c0
    ? lockdep_hardirqs_on+0x190/0x280
    ? is_dynamic_key+0x150/0x150
    ? wait_for_completion_interruptible+0x2a0/0x2a0
    ? wake_up_q+0xd0/0xd0
    __lock_acquire+0xda/0x21a0
    ? register_lock_class+0x8c0/0x8c0
    ? synchronize_rcu_expedited+0x500/0x500
    ? __call_rcu+0x850/0x850
    lock_acquire+0xf3/0x1f0
    ? del_timer_sync+0x5/0xb0
    del_timer_sync+0x3c/0xb0
    ? del_timer_sync+0x5/0xb0
    lpfc_pci_remove_one.cold.102+0x8b7/0x935 [lpfc]
    ...

    Unloading the driver resulted in a call to del_timer_sync for the
    cpuhp_poll_timer. However the call to setup the timer had never been made,
    so the timer structures used by lockdep checking were not initialized.

    Unconditionally call setup_timer for the cpuhp_poll_timer during driver
    initialization. Calls to start the timer remain "as needed".

    Link: https://lore.kernel.org/r/20200322181304.37655-3-jsmart2021@gmail.com
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • The following kasan bug was called out:

    BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc]
    Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676
    ...
    Call Trace:
    dump_stack+0x96/0xe0
    ? lpfc_unreg_login+0x7c/0xc0 [lpfc]
    print_address_description.constprop.6+0x1b/0x220
    ? lpfc_unreg_login+0x7c/0xc0 [lpfc]
    ? lpfc_unreg_login+0x7c/0xc0 [lpfc]
    __kasan_report.cold.9+0x37/0x7c
    ? lpfc_unreg_login+0x7c/0xc0 [lpfc]
    kasan_report+0xe/0x20
    lpfc_unreg_login+0x7c/0xc0 [lpfc]
    lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc]
    ...

    When processing the completion of a "Reg Rpi" login mailbox command in
    lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is
    extracted from the completing mailbox context and passed as an input for
    the next. However, the vpi stored in the mailbox command context is an
    absolute vpi, which for SLI4 represents both base + offset. When used with
    a non-zero base component, (function id > 0) this results in an
    out-of-range access beyond the allocated phba->vpi_ids array.

    Fix by subtracting the function's base value to get an accurate vpi number.

    Link: https://lore.kernel.org/r/20200322181304.37655-2-jsmart2021@gmail.com
    Signed-off-by: James Smart
    Signed-off-by: Dick Kennedy
    Signed-off-by: Martin K. Petersen

    James Smart
     
  • The file aic79xx_core.c still contains some FreeBSD-specific code/macro
    guards, although cross-compatibility was in theory removed with commit
    cca6cb8ad7a8 ("scsi: aic7xxx: Fix build using bare-metal toolchain").
    Remove it.

    Link: https://lore.kernel.org/r/20200326193817.12568-1-alex.dewar@gmx.co.uk
    Signed-off-by: Alex Dewar
    Signed-off-by: Martin K. Petersen

    Alex Dewar
     
  • We were setting bActiveICCLevel attribute for UFS device only once but the
    type of this attribute has changed from persistent to volatile since UFS
    device specification v2.1. This attribute is set to the default value after
    power cycle or hardware reset event. It isn't safe to rely on prefetched
    data (only used for bActiveICCLevel attribute now). Hence this change
    removes the code related to data prefetching and set this parameter on
    every attempt to probe the UFS device.

    Tested-by: Stanley Chu
    Reviewed-by: Stanley Chu
    Reviewed-by: Avri Altman
    Signed-off-by: Can Guo
    Signed-off-by: Martin K. Petersen

    Can Guo
     
  • dc395x_bios_param was only different from the default when the
    CONFIG_SCSI_DC395x_TRMS1040_TRADMAP symbol is true, but that symbol doesn't
    exist in the Kconfig system and thus can't be set.

    Link: https://lore.kernel.org/r/20200325105505.1028582-1-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Martin K. Petersen

    Christoph Hellwig
     
  • Fix an error count for active session if the total_cmds is invalid on the
    function iscsi_session_setup(). Decrement the number of active sessions
    before the funcion return.

    Link: https://lore.kernel.org/r/EDBAAA0BBBA2AC4E9C8B6B81DEEE1D6916A28542@DGGEML525-MBS.china.huawei.com
    Reviewed-by: Lee Duncan
    Signed-off-by: Wu Bo
    Signed-off-by: Martin K. Petersen

    Wu Bo
     
  • Correct race condition where ioaccel is re-enabled before the raid_map is
    updated. For RAID_1, RAID_1ADM, and RAID 5/6 there is a BUG_ON called which
    is bad.

    - Change event thread to disable ioaccel only. Send all requests down the
    RAID path instead.

    - Have rescan thread handle offload_enable.

    - Since there is only one rescan allowed at a time, turning
    offload_enabled on/off should not be racy. Each handler queues up a
    rescan if one is already in progress.

    - For timing diagram, offload_enabled is initially off due to a change
    (transformation: splitmirror/remirror), ...

    otbe = offload_to_be_enabled
    oe = offload_enabled

    Time Event Rescan Completion Request
    Worker Worker Thread Thread
    ---- ------ ------ ---------- -------
    T0 | | + UA |
    T1 | + rescan started | 0x3f |
    T2 + Event | | 0x0e |
    T3 + Ack msg | | |
    T4 | + if (!dev[i]->oe && | |
    T5 | | dev[i]->otbe) | |
    T6 | | get_raid_map | |
    T7 + otbe = 1 | | |
    T8 | | | |
    T9 | + oe = otbe | |
    T10 | | | + ioaccel request
    T11 * BUG_ON

    T0 - I/O completion with UA 0x3f 0x0e sets rescan flag.
    T1 - rescan worker thread starts a rescan.
    T2 - event comes in
    T3 - event thread starts and issues "Acknowledge" message
    ...
    T6 - rescan thread has bypassed code to reload new raid map.
    ...
    T7 - event thread runs and sets offload_to_be_enabled
    ...
    T9 - rescan thread turns on offload_enabled.
    T10- request comes in and goes down ioaccel path.
    T11- BUG_ON.

    - After the patch is applied, ioaccel_enabled can only be re-enabled in
    the re-scan thread.

    Link: https://lore.kernel.org/r/158472877894.14200.7077843399036368335.stgit@brunhilda
    Reviewed-by: Scott Teel
    Reviewed-by: Matt Perricone
    Reviewed-by: Scott Benesh
    Signed-off-by: Don Brace
    Signed-off-by: Martin K. Petersen

    Don Brace
     
  • The current codebase makes use of the zero-length array language extension
    to the C90 standard, but the preferred mechanism to declare variable-length
    types such as these ones is a flexible array member[1][2], introduced in
    C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning in
    case the flexible array does not occur last in the structure, which will
    help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by this
    change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Link: https://lore.kernel.org/r/20200319222533.GA20577@embeddedor.com
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Martin K. Petersen

    Gustavo A. R. Silva
     
  • Add PCI shutdown handler support for supporting wake-on-lan feature.

    Link: https://lore.kernel.org/r/20200319083811.19499-3-mrangankar@marvell.com
    Signed-off-by: Manish Rangankar
    Signed-off-by: Nilesh Javali
    Signed-off-by: Martin K. Petersen

    Manish Rangankar
     
  • This patch adds the mfw error recovery process in the qedi driver. The
    process includes a partial/customized driver unload and load to reset
    context by preserving active iSCSI session kernel state.

    Link: https://lore.kernel.org/r/20200319083811.19499-2-mrangankar@marvell.com
    Signed-off-by: Manish Rangankar
    Signed-off-by: Martin K. Petersen

    Manish Rangankar
     
  • Block layer RPM is enabled for the genernal UFS SCSI devices when they are
    probed by their driver. However block layer RPM is not enabled for UFS
    well-known SCSI devices.

    As UFS SCSI devices have their corresponding BSG char devices, accessing a
    BSG char device via IOCTL may send requests to its corresponding SCSI
    device through its request queue. If BSG IOCTL sends a request to a
    well-known SCSI device when HBA is not runtime active, due to block layer
    RPM not being enabled for the well-known SCSI devices, the HBA, which is at
    the top of a SCSI device's parent chain, will not be resumed.

    This change enables block layer RPM for the well-known SCSI devices so that
    block layer can handle RPM for the well-known SCSI devices just like for
    the general SCSI devices.

    Reviewed-by: Avri Altman
    Reviewed-by: Stanley Chu
    Signed-off-by: Can Guo
    Signed-off-by: Martin K. Petersen

    Can Guo
     
  • Override devfreq parameters for power-performance trade-off.

    Link: https://lore.kernel.org/r/b6875729b6072134985c9113a820cf60a2af22e7.1585160616.git.asutoshd@codeaurora.org
    Acked-by: Avri Altman
    Signed-off-by: Asutosh Das
    Signed-off-by: Martin K. Petersen

    Asutosh Das
     
  • Vendor drivers may have a need to update the polling interval and
    thresholds. Provide a vops for vendor drivers to use.

    Link: https://lore.kernel.org/r/acd79e00396cff855256adad47f615ccdbde85ac.1585160616.git.asutoshd@codeaurora.org
    Acked-by: Avri Altman
    Signed-off-by: Asutosh Das
    Signed-off-by: Martin K. Petersen

    Asutosh Das
     
  • Currently, the frequency that devfreq provides the driver always leads the
    clocks to be scaled up. Hence, round the clock-rate to the nearest
    frequency before deciding to scale.

    Also update the devfreq statistics of current frequency.

    Link: https://lore.kernel.org/r/d0c6c22455811e9f0eda01f9bc70d1398b51b2bd.1585160616.git.asutoshd@codeaurora.org
    Acked-by: Avri Altman
    Signed-off-by: Asutosh Das
    Signed-off-by: Martin K. Petersen

    Asutosh Das
     
  • As a part of sysfs reading of descriptors/attributes/flags, query commands
    should only be executed when hba's power runtime status is active. To
    guarantee this, add pm_runtime_get/put_sync() to those paths where query
    commands are sent.

    Link: https://lore.kernel.org/r/f712a4f7bdb0ae32e0d83634731e7aaa1b3a6cdd.1585009663.git.asutoshd@codeaurora.org
    Reviewed-by: Avri Altman
    Signed-off-by: Nitin Rawat
    Signed-off-by: Asutosh Das
    Signed-off-by: Martin K. Petersen

    Nitin Rawat
     
  • MediaTek platform and UFS controller can dynamically customize the delay
    for host enabling according to different scenarios.

    For example, if UniPro enters lower-power mode, such delay can be
    minimized, otherwise longer delay shall be expected.

    Link: https://lore.kernel.org/r/20200318104016.28049-8-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • Reduce the waiting period between each HCE (Host Controller Enable) polling
    from 5 ms to 1 ms. Also increase the maximum polling times to make "total
    polling time" roughly the same.

    This change could make HCE initialization faster to improve latency of
    ufshcd initialization, error recovery, and resume behaviors.

    Link: https://lore.kernel.org/r/20200318104016.28049-7-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Reviewed-by: Can Guo
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • Currently a 1 ms delay is applied before polling CONTROLLER_ENABLE
    bit. This delay may not be required or can be changed in different
    controllers. Make the delay as a changeable value in struct ufs_hba to
    allow it customized by vendors.

    Link: https://lore.kernel.org/r/20200318104016.28049-6-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Reviewed-by: Can Guo
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • A common delay function is introduced in UFS core driver, thus ufs-mediatek
    can use it instead of the private delay function.

    Link: https://lore.kernel.org/r/20200318104016.28049-5-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • Introduce a common delay function to provide flexible way for users to take
    choices of udelay and usleep_range into consideration according to the
    required delay time.

    Link: https://lore.kernel.org/r/20200318104016.28049-4-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Reviewed-by: Can Guo
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • Use an enum to specify the host capabilities instead of #defines inside the
    structure definition.

    Link: https://lore.kernel.org/r/20200318104016.28049-3-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Reviewed-by: Can Guo
    Reviewed-by: Bean Huo
    Reviewed-by: Asutosh Das
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • In ufshcd_disable_tx_lcc(), if ufshcd_dme_get() or ufshcd_dme_peer_get()
    get fail, uninitialized variable "tx_lanes" may be used as unexpected lane
    ID for DME configuration.

    Fix this issue by initializing "tx_lanes".

    Link: https://lore.kernel.org/r/20200318104016.28049-2-stanley.chu@mediatek.com
    Reviewed-by: Avri Altman
    Reviewed-by: Can Guo
    Reviewed-by: Asutosh Das
    Signed-off-by: Stanley Chu
    Signed-off-by: Martin K. Petersen

    Stanley Chu
     
  • If an iSCSI connection happens to fail while the daemon isn't running (due
    to a crash or for another reason), the kernel failure report is not
    received. When the daemon restarts, there is insufficient kernel state in
    sysfs for it to know that this happened. open-iscsi tries to reopen every
    connection, but on different initiators, we'd like to know which
    connections have failed.

    There is session->state, but that has a different lifetime than an iSCSI
    connection, so it doesn't directly reflect the connection state.

    [mkp: typos]

    Link: https://lore.kernel.org/r/20200317233422.532961-1-krisman@collabora.com
    Cc: Khazhismel Kumykov
    Suggested-by: Junho Ryu
    Reviewed-by: Lee Duncan
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Martin K. Petersen

    Gabriel Krisman Bertazi
     
  • Trace events target_sequencer_start and target_cmd_complete
    (include/trace/events/target.h) are ready to show NAA identifier, LUN ID,
    and many other important command details in the system log:

    TP_printk("%s -> LUN %03u %s data_length %6u CDB %s (TA:%s C:%02x)",

    However, it's still hard to identify command on the initiator and command
    on the target in the real life output of system log. For that purpose SCSI
    provides a command identifier or task tag (term used in previous
    standards). This patch adds tag ID in the system log's output:

    TP_printk("%s -> LUN %03u tag %#llx %s data_length %6u CDB %s (TA:%s C:%02x)",

    kworker/1:1-35 [001] .... 1392.989452: target_sequencer_start:
    naa.5001405ec1ba6364 -> LUN 001 tag 0x1
    SERVICE_ACTION_IN_16 data_length 32
    CDB 9e 10 00 00 00 00 00 00 00 00 00 00 00 20 00 00 (TA:SIMPLE C:00)

    kworker/1:1-35 [001] .... 1392.989456: target_cmd_complete:
    naa.5001405ec1ba6364
    Reviewed-by: Konstantin Shelekhin
    Reviewed-by: Bart van Assche
    Signed-off-by: Viacheslav Dubeyko
    Signed-off-by: Martin K. Petersen

    Viacheslav Dubeyko
     
  • iscsit_close_session() can only be called when nconn is zero (otherwise a
    kernel panic is triggered). If nconn is zero then iscsit_stop_session()
    does nothing and exits, so calling it makes no sense.

    We still need to call iscsit_check_session_usage_count() because this
    function will sleep if the session's refcount is not zero and we don't want
    to destroy the session structure if it's still being referenced.

    Link: https://lore.kernel.org/r/20200313170656.9716-4-mlombard@redhat.com
    Tested-by: Rahul Kundu
    Signed-off-by: Maurizio Lombardi
    Signed-off-by: Martin K. Petersen

    Maurizio Lombardi
     
  • A number of hangs have been reported against the target driver; they are
    due to the fact that multiple threads may try to destroy the iscsi session
    at the same time. This may be reproduced for example when a "targetcli
    iscsi/iqn.../tpg1 disable" command is executed while a logout operation is
    underway.

    When this happens, two or more threads may end up sleeping and waiting for
    iscsit_close_connection() to execute "complete(session_wait_comp)". Only
    one of the threads will wake up and proceed to destroy the session
    structure, the remaining threads will hang forever.

    Note that if the blocked threads are somehow forced to wake up with
    complete_all(), they will try to free the same iscsi session structure
    destroyed by the first thread, causing double frees, memory corruptions
    etc...

    With this patch, the threads that want to destroy the iscsi session will
    increase the session refcount and will set the "session_close" flag to 1;
    then they wait for the driver to close the remaining active connections.
    When the last connection is closed, iscsit_close_connection() will wake up
    all the threads and will wait for the session's refcount to reach zero;
    when this happens, iscsit_close_connection() will destroy the session
    structure because no one is referencing it anymore.

    INFO: task targetcli:5971 blocked for more than 120 seconds.
    Tainted: P OE 4.15.0-72-generic #81~16.04.1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    targetcli D 0 5971 1 0x00000080
    Call Trace:
    __schedule+0x3d6/0x8b0
    ? vprintk_func+0x44/0xe0
    schedule+0x36/0x80
    schedule_timeout+0x1db/0x370
    ? __dynamic_pr_debug+0x8a/0xb0
    wait_for_completion+0xb4/0x140
    ? wake_up_q+0x70/0x70
    iscsit_free_session+0x13d/0x1a0 [iscsi_target_mod]
    iscsit_release_sessions_for_tpg+0x16b/0x1e0 [iscsi_target_mod]
    iscsit_tpg_disable_portal_group+0xca/0x1c0 [iscsi_target_mod]
    lio_target_tpg_enable_store+0x66/0xe0 [iscsi_target_mod]
    configfs_write_file+0xb9/0x120
    __vfs_write+0x1b/0x40
    vfs_write+0xb8/0x1b0
    SyS_write+0x5c/0xe0
    do_syscall_64+0x73/0x130
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Link: https://lore.kernel.org/r/20200313170656.9716-3-mlombard@redhat.com
    Reported-by: Matt Coleman
    Tested-by: Matt Coleman
    Tested-by: Rahul Kundu
    Signed-off-by: Maurizio Lombardi
    Signed-off-by: Martin K. Petersen

    Maurizio Lombardi
     
  • iscsit_free_session() is equivalent to iscsit_stop_session() followed by a
    call to iscsit_close_session().

    Link: https://lore.kernel.org/r/20200313170656.9716-2-mlombard@redhat.com
    Tested-by: Rahul Kundu
    Signed-off-by: Maurizio Lombardi
    Signed-off-by: Martin K. Petersen

    Maurizio Lombardi
     
  • If 'dma_map_single()' fails, the ref counted 'shpnt' will be decremented
    twice because 'scsi_host_put()' is called in the if block, and in the error
    handling path.

    Axe one of these calls.

    Link: https://lore.kernel.org/r/20200228215948.7473-1-christophe.jaillet@wanadoo.fr
    Fixes: 1dc09e120c83 ("scsi: aha1740: stop using scsi_unregister")
    Signed-off-by: Christophe JAILLET
    Signed-off-by: Martin K. Petersen

    Christophe JAILLET
     
  • Remove code which has no functional use anymore since commit 3c75ad1d87c7
    ("scsi: qla2xxx: Remove defer flag to indicate immeadiate port loss").

    While at it remove also the stale function documentation.

    Link: https://lore.kernel.org/r/20200206135443.110701-1-dwagner@suse.de
    Reviewed-by: Arun Easi
    Reviewed-by: Lee Duncan
    Signed-off-by: Daniel Wagner
    Signed-off-by: Martin K. Petersen

    Daniel Wagner
     

18 Mar, 2020

6 commits

  • Removed the common length and introduce read and write length for IOCTL
    payload structure.

    [mkp: fixed SoB ordering]

    Link: https://lore.kernel.org/r/20200316074906.9119-7-deepak.ukey@microchip.com
    Acked-by: Jack Wang
    Signed-off-by: Viswas G
    Signed-off-by: Deepak Ukey
    Signed-off-by: Radha Ramachandran
    Signed-off-by: Martin K. Petersen

    Viswas G
     
  • Added the sysfs attribute for non fatal log so that management utility can
    get the non fatal dump from driver. The non-fatal error is an error
    condition or abnormal behavior detected by the host, or detected and
    reported by the controller to the host.The non-fatal error does not stop
    the controller firmware and enables it to still respond to host requests.
    A typical example of a non-fatal error is an I/O timeout or an unusual
    error notification from the controller. Since the firmware is operational,
    the error dump information is pushed to host memory (by firmware) upon
    request from the host.

    Link: https://lore.kernel.org/r/20200316074906.9119-6-deepak.ukey@microchip.com
    Acked-by: Jack Wang
    Signed-off-by: Deepak Ukey
    Signed-off-by: Viswas G
    Signed-off-by: Radha Ramachandran
    Signed-off-by: Martin K. Petersen

    Deepak Ukey
     
  • 1) Move the instance tracking down after we think the instance is good to
    go. Avoids having a use-after free.

    2) There are goto targets for trying to cleanup if the hw fails to
    initialize, but there's some overlap depending on who thinks they own
    the sub-structures.

    Link: https://lore.kernel.org/r/20200316074906.9119-5-deepak.ukey@microchip.com
    Acked-by: Jack Wang
    Signed-off-by: Peter Chang
    Signed-off-by: Deepak Ukey
    Signed-off-by: Viswas G
    Signed-off-by: Radha Ramachandran
    Signed-off-by: Martin K. Petersen

    Peter Chang
     
  • In pm80xx driver, the command mpi_set_phy_profile_req is sent by host
    during boot to configure the phy profile such as analog setting page, rate
    control page. However, the tag is not freed when its response is
    received. As a result, 16 tags are missing for each HBA after boot. When
    NCQ is enabled with queue depth 16, it needs at least, 15 * 16 = 240 tags
    for each HBA to achieve the best performance. In current pm80xx driver with
    setting CCB_MAX = 256, the total number of tags in each HBA is 255 for data
    IO. Hence, without returning those tags to the pool after boot, some device
    will finally be forced to non-ncq mode by ATA layer due to excessive errors
    (i.e. LLDD cannot allocate tag for queued task).

    Link: https://lore.kernel.org/r/20200316074906.9119-4-deepak.ukey@microchip.com
    Acked-by: Jack Wang
    Signed-off-by: yuuzheng
    Signed-off-by: Deepak Ukey
    Signed-off-by: Viswas G
    Signed-off-by: Radha Ramachandran
    Signed-off-by: Martin K. Petersen

    yuuzheng
     
  • A kexec reboot causes the controller fw to assert. This assertion shows up
    in two ways, the controller doesn't show up as ready and an interrupt is
    waiting as soon as the handler is registered. To resolve this added below
    fix:

    - Split the interrupt handling setup into two parts, setup and request.

    - If the controller ready register indicates not-ready, but that the not
    readiness is only on the IOC units we can still try a reset to bring the
    system back to the pre-reboot state.

    Link: https://lore.kernel.org/r/20200316074906.9119-3-deepak.ukey@microchip.com
    Acked-by: Jack Wang
    Signed-off-by: Vikram Auradkar
    Signed-off-by: Deepak Ukey
    Signed-off-by: Viswas G
    Signed-off-by: Radha Ramachandran
    Signed-off-by: Martin K. Petersen

    Vikram Auradkar
     
  • Increasing the per-request size maximum (max_sectors_kb) runs into the
    per-device DMA scatter gather list limit (max_segments) for users of the io
    vector system calls (eg, readv and writev). This is because the kernel
    combines io vectors into DMA segments when possible, but it doesn't work
    for our user because the vectors in the buffer cache get scrambled. This
    change bumps the advertised max scatter gather length to 528 to cover 2M w/
    x86's 4k pages and some extra for the user checksum. It trims the size of
    some of the tables we don't care about and exposes all of the command slots
    upstream to the SCSI layer. Also reduced the PM8001_MAX_CCB to 256 as
    pm8001 driver has memory limit depend on machine capability. If we increase
    the sg length, we need to trade-off it by decreasing PM8001_MAX_CCB.
    PM8001_MAX_CCB = 256 does not have any influence on normal use

    Link: https://lore.kernel.org/r/20200316074906.9119-2-deepak.ukey@microchip.com
    Reported-by: kbuild test robot
    Acked-by: Jack Wang
    Signed-off-by: Peter Chang
    Signed-off-by: Deepak Ukey
    Signed-off-by: Viswas G
    Signed-off-by: Radha Ramachandran
    Signed-off-by: Martin K. Petersen

    Peter Chang