Eric Lee / smarc-fsl-linux-kernel

27 Mar, 2020

34 commits

807e7353d scsi: lpfc: Fix crash in target side cable pulls hitting WAIT_FOR_UNREG ... Browse Code »

Kernel is crashing with the following stacktrace:

BUG: unable to handle kernel NULL pointer dereference at
00000000000005bc
IP: lpfc_nvme_register_port+0x1a8/0x3a0 [lpfc]
...
Call Trace:
lpfc_nlp_state_cleanup+0x2b2/0x500 [lpfc]
lpfc_nlp_set_state+0xd7/0x1a0 [lpfc]
lpfc_cmpl_prli_prli_issue+0x1f7/0x450 [lpfc]
lpfc_disc_state_machine+0x7a/0x1e0 [lpfc]
lpfc_cmpl_els_prli+0x16f/0x1e0 [lpfc]
lpfc_sli_sp_handle_rspiocb+0x5b2/0x690 [lpfc]
lpfc_sli_handle_slow_ring_event_s4+0x182/0x230 [lpfc]
lpfc_do_work+0x87f/0x1570 [lpfc]
kthread+0x10d/0x130
ret_from_fork+0x35/0x40

During target side fault injections, it is possible to hit the
NLP_WAIT_FOR_UNREG case in lpfc_nvme_remoteport_delete. A prior commit
fixed a rebind and delete race condition, but called lpfc_nlp_put
unconditionally. This triggered a deletion and the crash.

Fix by movng nlp_put to inside the NLP_WAIT_FOR_UNREG case, where the nlp
will be being unregistered/removed. Leave the reference if the flag isn't
set.

Link: https://lore.kernel.org/r/20200322181304.37655-8-jsmart2021@gmail.com
Fixes: b15bd3e6212e ("scsi: lpfc: Fix nvme remoteport registration race conditions")
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:10 +0800
1543af381 scsi: lpfc: Fix update of wq consumer index in lpfc_sli4_wq_release ... Browse Code »

The lpfc_sli4_wq_release() routine iterates for each interim value when
updating the wq consuemr index. This wastes cycles and possibly confuses
things as thevalue itterates (and the modulo logic is being applied).

There's no reason for this. Just set it to the value from the hw.

Link: https://lore.kernel.org/r/20200322181304.37655-7-jsmart2021@gmail.com
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:10 +0800
4cd708913 scsi: lpfc: Fix crash after handling a pci error ... Browse Code »

Injecting EEH on a 32GB card is causing kernel oops

The pci error handler is doing an IO flush and the offline code is also
doing an IO flush. When the 1st flush is complete the hdwq is destroyed
(freed), yet the second flush accesses the hdwq and crashes.

Added a check in lpfc_sli4_fush_io_rings to check both the HBA_IOQ_FLUSH
flag and the hdwq pointer to see if it is already set and not already
freed.

Link: https://lore.kernel.org/r/20200322181304.37655-6-jsmart2021@gmail.com
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:09 +0800
c90b44802 scsi: lpfc: Fix scsi host template for SLI3 vports ... Browse Code »

SCSI layer sends driver IOs with more s/g segments than driver can handle.
This results in "Too many sg segments from dma_map_sg. Config 64, seg_cnt
219" error messages from the lpfc_scsi_prep_dma_buf_s3() routine.

The was due to use the driver using individual templates for pport and
vport, host reset enabled or not, nvme vs scsi, etc. In the end, there was
a combination for a vport that didn't match the pport.

Rather than enumerating more templates and more discretionary assignments,
revert to a base template that is copied to a template specific to the
pport/vport. Then, based on role, attributes and sli type, modify the
fields that are different for that port. Added a log message to
lpfc_create_port to validate values.

Link: https://lore.kernel.org/r/20200322181304.37655-5-jsmart2021@gmail.com
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:08 +0800
e7f403491 scsi: lpfc: Fix lpfc overwrite of sg_cnt field in nvmefc_tgt_fcp_req ... Browse Code »

In lpfc_nvmet_prep_fcp_wqe() the line "rsp->sg_cnt = 0" is modifying the
transport's data structure. This may result in the transport believing the
s/g list was already freed, thus may not unmap/free it properly. Lpfc
driver should not modify the transport data structure.

The zeroing of the sg_cnt is to avoid use of the transport's sgl in a
subsequent loop where the driver builds the necessary requests for the
adapter firmware to complete the IO.

Change LLDD to use a local copy of the transport sg_cnt when building
requests to be passed to the adapter fw.

Link: https://lore.kernel.org/r/20200322181304.37655-4-jsmart2021@gmail.com
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:07 +0800
f861f5967 scsi: lpfc: Fix lockdep error - register non-static key ... Browse Code »

The following lockdep error was reported when unloading the lpfc driver:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
...
Call Trace:
dump_stack+0x96/0xe0
register_lock_class+0x8b8/0x8c0
? lockdep_hardirqs_on+0x190/0x280
? is_dynamic_key+0x150/0x150
? wait_for_completion_interruptible+0x2a0/0x2a0
? wake_up_q+0xd0/0xd0
__lock_acquire+0xda/0x21a0
? register_lock_class+0x8c0/0x8c0
? synchronize_rcu_expedited+0x500/0x500
? __call_rcu+0x850/0x850
lock_acquire+0xf3/0x1f0
? del_timer_sync+0x5/0xb0
del_timer_sync+0x3c/0xb0
? del_timer_sync+0x5/0xb0
lpfc_pci_remove_one.cold.102+0x8b7/0x935 [lpfc]
...

Unloading the driver resulted in a call to del_timer_sync for the
cpuhp_poll_timer. However the call to setup the timer had never been made,
so the timer structures used by lockdep checking were not initialized.

Unconditionally call setup_timer for the cpuhp_poll_timer during driver
initialization. Calls to start the timer remain "as needed".

Link: https://lore.kernel.org/r/20200322181304.37655-3-jsmart2021@gmail.com
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:06 +0800
38503943c scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login ... Browse Code »

The following kasan bug was called out:

BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc]
Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676
...
Call Trace:
dump_stack+0x96/0xe0
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
print_address_description.constprop.6+0x1b/0x220
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
__kasan_report.cold.9+0x37/0x7c
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
kasan_report+0xe/0x20
lpfc_unreg_login+0x7c/0xc0 [lpfc]
lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc]
...

When processing the completion of a "Reg Rpi" login mailbox command in
lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is
extracted from the completing mailbox context and passed as an input for
the next. However, the vpi stored in the mailbox command context is an
absolute vpi, which for SLI4 represents both base + offset. When used with
a non-zero base component, (function id > 0) this results in an
out-of-range access beyond the allocated phba->vpi_ids array.

Fix by subtracting the function's base value to get an accurate vpi number.

Link: https://lore.kernel.org/r/20200322181304.37655-2-jsmart2021@gmail.com
Signed-off-by: James Smart
Signed-off-by: Dick Kennedy
Signed-off-by: Martin K. Petersen

James Smart
2020-03-27 11:15:05 +0800
ff275db92 scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code ... Browse Code »

The file aic79xx_core.c still contains some FreeBSD-specific code/macro
guards, although cross-compatibility was in theory removed with commit
cca6cb8ad7a8 ("scsi: aic7xxx: Fix build using bare-metal toolchain").
Remove it.

Link: https://lore.kernel.org/r/20200326193817.12568-1-alex.dewar@gmx.co.uk
Signed-off-by: Alex Dewar
Signed-off-by: Martin K. Petersen

Alex Dewar
2020-03-27 11:00:14 +0800
e89860f19 scsi: ufs: Do not rely on prefetched data ... Browse Code »

We were setting bActiveICCLevel attribute for UFS device only once but the
type of this attribute has changed from persistent to volatile since UFS
device specification v2.1. This attribute is set to the default value after
power cycle or hardware reset event. It isn't safe to rely on prefetched
data (only used for bActiveICCLevel attribute now). Hence this change
removes the code related to data prefetching and set this parameter on
every attempt to probe the UFS device.

Tested-by: Stanley Chu
Reviewed-by: Stanley Chu
Reviewed-by: Avri Altman
Signed-off-by: Can Guo
Signed-off-by: Martin K. Petersen

Can Guo
2020-03-27 10:56:33 +0800
ccfa00a86 scsi: dc395x: remove dc395x_bios_param ... Browse Code »

dc395x_bios_param was only different from the default when the
CONFIG_SCSI_DC395x_TRMS1040_TRADMAP symbol is true, but that symbol doesn't
exist in the Kconfig system and thus can't be set.

Link: https://lore.kernel.org/r/20200325105505.1028582-1-hch@lst.de
Signed-off-by: Christoph Hellwig
Signed-off-by: Martin K. Petersen

Christoph Hellwig
2020-03-27 10:51:18 +0800
1d99702f9 scsi: libiscsi: Fix error count for active session ... Browse Code »

Fix an error count for active session if the total_cmds is invalid on the
function iscsi_session_setup(). Decrement the number of active sessions
before the funcion return.

Link: https://lore.kernel.org/r/EDBAAA0BBBA2AC4E9C8B6B81DEEE1D6916A28542@DGGEML525-MBS.china.huawei.com
Reviewed-by: Lee Duncan
Signed-off-by: Wu Bo
Signed-off-by: Martin K. Petersen

Wu Bo
2020-03-27 10:48:58 +0800
3e16e83a6 scsi: hpsa: correct race condition in offload enabled ... Browse Code »

Correct race condition where ioaccel is re-enabled before the raid_map is
updated. For RAID_1, RAID_1ADM, and RAID 5/6 there is a BUG_ON called which
is bad.

- Change event thread to disable ioaccel only. Send all requests down the
RAID path instead.

- Have rescan thread handle offload_enable.

- Since there is only one rescan allowed at a time, turning
offload_enabled on/off should not be racy. Each handler queues up a
rescan if one is already in progress.

- For timing diagram, offload_enabled is initially off due to a change
(transformation: splitmirror/remirror), ...

otbe = offload_to_be_enabled
oe = offload_enabled

Time Event Rescan Completion Request
Worker Worker Thread Thread
---- ------ ------ ---------- -------
T0 | | + UA |
T1 | + rescan started | 0x3f |
T2 + Event | | 0x0e |
T3 + Ack msg | | |
T4 | + if (!dev[i]->oe && | |
T5 | | dev[i]->otbe) | |
T6 | | get_raid_map | |
T7 + otbe = 1 | | |
T8 | | | |
T9 | + oe = otbe | |
T10 | | | + ioaccel request
T11 * BUG_ON

T0 - I/O completion with UA 0x3f 0x0e sets rescan flag.
T1 - rescan worker thread starts a rescan.
T2 - event comes in
T3 - event thread starts and issues "Acknowledge" message
...
T6 - rescan thread has bypassed code to reload new raid map.
...
T7 - event thread runs and sets offload_to_be_enabled
...
T9 - rescan thread turns on offload_enabled.
T10- request comes in and goes down ioaccel path.
T11- BUG_ON.

- After the patch is applied, ioaccel_enabled can only be re-enabled in
the re-scan thread.

Link: https://lore.kernel.org/r/158472877894.14200.7077843399036368335.stgit@brunhilda
Reviewed-by: Scott Teel
Reviewed-by: Matt Perricone
Reviewed-by: Scott Benesh
Signed-off-by: Don Brace
Signed-off-by: Martin K. Petersen

Don Brace
2020-03-27 10:44:41 +0800
fd6282af8 scsi: message: fusion: Replace zero-length array with flexible-array member ... Browse Code »

The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare variable-length
types such as these ones is a flexible array member[1][2], introduced in
C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by this
change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Link: https://lore.kernel.org/r/20200319222533.GA20577@embeddedor.com
Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Martin K. Petersen

Gustavo A. R. Silva
2020-03-27 10:40:47 +0800
4f93c4bf0 scsi: qedi: Add PCI shutdown handler support ... Browse Code »

Add PCI shutdown handler support for supporting wake-on-lan feature.

Link: https://lore.kernel.org/r/20200319083811.19499-3-mrangankar@marvell.com
Signed-off-by: Manish Rangankar
Signed-off-by: Nilesh Javali
Signed-off-by: Martin K. Petersen

Manish Rangankar
2020-03-27 10:38:54 +0800
4b1068f5d scsi: qedi: Add MFW error recovery process ... Browse Code »

This patch adds the mfw error recovery process in the qedi driver. The
process includes a partial/customized driver unload and load to reset
context by preserving active iSCSI session kernel state.

Link: https://lore.kernel.org/r/20200319083811.19499-2-mrangankar@marvell.com
Signed-off-by: Manish Rangankar
Signed-off-by: Martin K. Petersen

Manish Rangankar
2020-03-27 10:38:52 +0800
fb276f770 scsi: ufs: Enable block layer runtime PM for well-known logical units ... Browse Code »

Block layer RPM is enabled for the genernal UFS SCSI devices when they are
probed by their driver. However block layer RPM is not enabled for UFS
well-known SCSI devices.

As UFS SCSI devices have their corresponding BSG char devices, accessing a
BSG char device via IOCTL may send requests to its corresponding SCSI
device through its request queue. If BSG IOCTL sends a request to a
well-known SCSI device when HBA is not runtime active, due to block layer
RPM not being enabled for the well-known SCSI devices, the HBA, which is at
the top of a SCSI device's parent chain, will not be resumed.

This change enables block layer RPM for the well-known SCSI devices so that
block layer can handle RPM for the well-known SCSI devices just like for
the general SCSI devices.

Reviewed-by: Avri Altman
Reviewed-by: Stanley Chu
Signed-off-by: Can Guo
Signed-off-by: Martin K. Petersen

Can Guo
2020-03-27 10:30:44 +0800
80b21006c scsi: ufs-qcom: Override devfreq parameters ... Browse Code »

Override devfreq parameters for power-performance trade-off.

Link: https://lore.kernel.org/r/b6875729b6072134985c9113a820cf60a2af22e7.1585160616.git.asutoshd@codeaurora.org
Acked-by: Avri Altman
Signed-off-by: Asutosh Das
Signed-off-by: Martin K. Petersen

Asutosh Das
2020-03-27 10:18:14 +0800
2c75f9a5b scsi: ufshcd: Let vendor override devfreq parameters ... Browse Code »

Vendor drivers may have a need to update the polling interval and
thresholds. Provide a vops for vendor drivers to use.

Link: https://lore.kernel.org/r/acd79e00396cff855256adad47f615ccdbde85ac.1585160616.git.asutoshd@codeaurora.org
Acked-by: Avri Altman
Signed-off-by: Asutosh Das
Signed-off-by: Martin K. Petersen

Asutosh Das
2020-03-27 10:18:13 +0800
91831d333 scsi: ufshcd: Update the set frequency to devfreq ... Browse Code »

Currently, the frequency that devfreq provides the driver always leads the
clocks to be scaled up. Hence, round the clock-rate to the nearest
frequency before deciding to scale.

Also update the devfreq statistics of current frequency.

Link: https://lore.kernel.org/r/d0c6c22455811e9f0eda01f9bc70d1398b51b2bd.1585160616.git.asutoshd@codeaurora.org
Acked-by: Avri Altman
Signed-off-by: Asutosh Das
Signed-off-by: Martin K. Petersen

Asutosh Das
2020-03-27 10:18:12 +0800
0c2039dc1 scsi: ufs: Resume ufs host before accessing ufs device ... Browse Code »

As a part of sysfs reading of descriptors/attributes/flags, query commands
should only be executed when hba's power runtime status is active. To
guarantee this, add pm_runtime_get/put_sync() to those paths where query
commands are sent.

Link: https://lore.kernel.org/r/f712a4f7bdb0ae32e0d83634731e7aaa1b3a6cdd.1585009663.git.asutoshd@codeaurora.org
Reviewed-by: Avri Altman
Signed-off-by: Nitin Rawat
Signed-off-by: Asutosh Das
Signed-off-by: Martin K. Petersen

Nitin Rawat
2020-03-27 10:11:01 +0800
73e990b42 scsi: ufs-mediatek: customize the delay for enabling host ... Browse Code »

MediaTek platform and UFS controller can dynamically customize the delay
for host enabling according to different scenarios.

For example, if UniPro enters lower-power mode, such delay can be
minimized, otherwise longer delay shall be expected.

Link: https://lore.kernel.org/r/20200318104016.28049-8-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:16 +0800
9fc305ef8 scsi: ufs: make HCE polling more compact to improve initialization latency ... Browse Code »

Reduce the waiting period between each HCE (Host Controller Enable) polling
from 5 ms to 1 ms. Also increase the maximum polling times to make "total
polling time" roughly the same.

This change could make HCE initialization faster to improve latency of
ufshcd initialization, error recovery, and resume behaviors.

Link: https://lore.kernel.org/r/20200318104016.28049-7-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Reviewed-by: Can Guo
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:15 +0800
b9dc8aca2 scsi: ufs: allow custom delay prior to host enabling ... Browse Code »

Currently a 1 ms delay is applied before polling CONTROLLER_ENABLE
bit. This delay may not be required or can be changed in different
controllers. Make the delay as a changeable value in struct ufs_hba to
allow it customized by vendors.

Link: https://lore.kernel.org/r/20200318104016.28049-6-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Reviewed-by: Can Guo
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:15 +0800
c2f755d2c scsi: ufs-mediatek: use common delay function ... Browse Code »

A common delay function is introduced in UFS core driver, thus ufs-mediatek
can use it instead of the private delay function.

Link: https://lore.kernel.org/r/20200318104016.28049-5-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:14 +0800
5c955c10d scsi: ufs: introduce common and flexible delay function ... Browse Code »

Introduce a common delay function to provide flexible way for users to take
choices of udelay and usleep_range into consideration according to the
required delay time.

Link: https://lore.kernel.org/r/20200318104016.28049-4-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Reviewed-by: Can Guo
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:13 +0800
c2014682d scsi: ufs: use an enum for host capabilities ... Browse Code »

Use an enum to specify the host capabilities instead of #defines inside the
structure definition.

Link: https://lore.kernel.org/r/20200318104016.28049-3-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Reviewed-by: Can Guo
Reviewed-by: Bean Huo
Reviewed-by: Asutosh Das
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:12 +0800
ba0320fbb scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc() ... Browse Code »

In ufshcd_disable_tx_lcc(), if ufshcd_dme_get() or ufshcd_dme_peer_get()
get fail, uninitialized variable "tx_lanes" may be used as unexpected lane
ID for DME configuration.

Fix this issue by initializing "tx_lanes".

Link: https://lore.kernel.org/r/20200318104016.28049-2-stanley.chu@mediatek.com
Reviewed-by: Avri Altman
Reviewed-by: Can Guo
Reviewed-by: Asutosh Das
Signed-off-by: Stanley Chu
Signed-off-by: Martin K. Petersen

Stanley Chu
2020-03-27 10:07:12 +0800
82b8cf40b scsi: iscsi: Report connection state in sysfs ... Browse Code »

If an iSCSI connection happens to fail while the daemon isn't running (due
to a crash or for another reason), the kernel failure report is not
received. When the daemon restarts, there is insufficient kernel state in
sysfs for it to know that this happened. open-iscsi tries to reopen every
connection, but on different initiators, we'd like to know which
connections have failed.

There is session->state, but that has a different lifetime than an iSCSI
connection, so it doesn't directly reflect the connection state.

[mkp: typos]

Link: https://lore.kernel.org/r/20200317233422.532961-1-krisman@collabora.com
Cc: Khazhismel Kumykov
Suggested-by: Junho Ryu
Reviewed-by: Lee Duncan
Signed-off-by: Gabriel Krisman Bertazi
Signed-off-by: Martin K. Petersen

Gabriel Krisman Bertazi
2020-03-27 09:59:20 +0800
1a0275239 scsi: target: core: add task tag to trace events ... Browse Code »

Trace events target_sequencer_start and target_cmd_complete
(include/trace/events/target.h) are ready to show NAA identifier, LUN ID,
and many other important command details in the system log:

TP_printk("%s -> LUN %03u %s data_length %6u CDB %s (TA:%s C:%02x)",

However, it's still hard to identify command on the initiator and command
on the target in the real life output of system log. For that purpose SCSI
provides a command identifier or task tag (term used in previous
standards). This patch adds tag ID in the system log's output:

TP_printk("%s -> LUN %03u tag %#llx %s data_length %6u CDB %s (TA:%s C:%02x)",

kworker/1:1-35 [001] .... 1392.989452: target_sequencer_start:
naa.5001405ec1ba6364 -> LUN 001 tag 0x1
SERVICE_ACTION_IN_16 data_length 32
CDB 9e 10 00 00 00 00 00 00 00 00 00 00 00 20 00 00 (TA:SIMPLE C:00)

kworker/1:1-35 [001] .... 1392.989456: target_cmd_complete:
naa.5001405ec1ba6364
Reviewed-by: Konstantin Shelekhin
Reviewed-by: Bart van Assche
Signed-off-by: Viacheslav Dubeyko
Signed-off-by: Martin K. Petersen

Viacheslav Dubeyko
2020-03-27 09:56:04 +0800
626bac733 scsi: target: iscsi: calling iscsit_stop_session() inside iscsit_close_session() has no effect ... Browse Code »

iscsit_close_session() can only be called when nconn is zero (otherwise a
kernel panic is triggered). If nconn is zero then iscsit_stop_session()
does nothing and exits, so calling it makes no sense.

We still need to call iscsit_check_session_usage_count() because this
function will sleep if the session's refcount is not zero and we don't want
to destroy the session structure if it's still being referenced.

Link: https://lore.kernel.org/r/20200313170656.9716-4-mlombard@redhat.com
Tested-by: Rahul Kundu
Signed-off-by: Maurizio Lombardi
Signed-off-by: Martin K. Petersen

Maurizio Lombardi
2020-03-27 09:47:47 +0800
57c46e9f3 scsi: target: fix hang when multiple threads try to destroy the same iscsi session ... Browse Code »

A number of hangs have been reported against the target driver; they are
due to the fact that multiple threads may try to destroy the iscsi session
at the same time. This may be reproduced for example when a "targetcli
iscsi/iqn.../tpg1 disable" command is executed while a logout operation is
underway.

When this happens, two or more threads may end up sleeping and waiting for
iscsit_close_connection() to execute "complete(session_wait_comp)". Only
one of the threads will wake up and proceed to destroy the session
structure, the remaining threads will hang forever.

Note that if the blocked threads are somehow forced to wake up with
complete_all(), they will try to free the same iscsi session structure
destroyed by the first thread, causing double frees, memory corruptions
etc...

With this patch, the threads that want to destroy the iscsi session will
increase the session refcount and will set the "session_close" flag to 1;
then they wait for the driver to close the remaining active connections.
When the last connection is closed, iscsit_close_connection() will wake up
all the threads and will wait for the session's refcount to reach zero;
when this happens, iscsit_close_connection() will destroy the session
structure because no one is referencing it anymore.

INFO: task targetcli:5971 blocked for more than 120 seconds.
Tainted: P OE 4.15.0-72-generic #81~16.04.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
targetcli D 0 5971 1 0x00000080
Call Trace:
__schedule+0x3d6/0x8b0
? vprintk_func+0x44/0xe0
schedule+0x36/0x80
schedule_timeout+0x1db/0x370
? __dynamic_pr_debug+0x8a/0xb0
wait_for_completion+0xb4/0x140
? wake_up_q+0x70/0x70
iscsit_free_session+0x13d/0x1a0 [iscsi_target_mod]
iscsit_release_sessions_for_tpg+0x16b/0x1e0 [iscsi_target_mod]
iscsit_tpg_disable_portal_group+0xca/0x1c0 [iscsi_target_mod]
lio_target_tpg_enable_store+0x66/0xe0 [iscsi_target_mod]
configfs_write_file+0xb9/0x120
__vfs_write+0x1b/0x40
vfs_write+0xb8/0x1b0
SyS_write+0x5c/0xe0
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Link: https://lore.kernel.org/r/20200313170656.9716-3-mlombard@redhat.com
Reported-by: Matt Coleman
Tested-by: Matt Coleman
Tested-by: Rahul Kundu
Signed-off-by: Maurizio Lombardi
Signed-off-by: Martin K. Petersen

Maurizio Lombardi
2020-03-27 09:47:47 +0800
e49a7d994 scsi: target: remove boilerplate code ... Browse Code »

iscsit_free_session() is equivalent to iscsit_stop_session() followed by a
call to iscsit_close_session().

Link: https://lore.kernel.org/r/20200313170656.9716-2-mlombard@redhat.com
Tested-by: Rahul Kundu
Signed-off-by: Maurizio Lombardi
Signed-off-by: Martin K. Petersen

Maurizio Lombardi
2020-03-27 09:47:46 +0800
0f3d67915 scsi: aha1740: Fix an errro handling path in aha1740_probe() ... Browse Code »

If 'dma_map_single()' fails, the ref counted 'shpnt' will be decremented
twice because 'scsi_host_put()' is called in the if block, and in the error
handling path.

Axe one of these calls.

Link: https://lore.kernel.org/r/20200228215948.7473-1-christophe.jaillet@wanadoo.fr
Fixes: 1dc09e120c83 ("scsi: aha1740: stop using scsi_unregister")
Signed-off-by: Christophe JAILLET
Signed-off-by: Martin K. Petersen

Christophe JAILLET
2020-03-27 09:10:53 +0800
1b72e86dd scsi: qla2xxx: Remove non functional code ... Browse Code »

Remove code which has no functional use anymore since commit 3c75ad1d87c7
("scsi: qla2xxx: Remove defer flag to indicate immeadiate port loss").

While at it remove also the stale function documentation.

Link: https://lore.kernel.org/r/20200206135443.110701-1-dwagner@suse.de
Reviewed-by: Arun Easi
Reviewed-by: Lee Duncan
Signed-off-by: Daniel Wagner
Signed-off-by: Martin K. Petersen

Daniel Wagner
2020-03-27 09:07:29 +0800

18 Mar, 2020

6 commits

9b8898465 scsi: pm80xx: Introduce read and write length for IOCTL payload structure ... Browse Code »

Removed the common length and introduce read and write length for IOCTL
payload structure.

[mkp: fixed SoB ordering]

Link: https://lore.kernel.org/r/20200316074906.9119-7-deepak.ukey@microchip.com
Acked-by: Jack Wang
Signed-off-by: Viswas G
Signed-off-by: Deepak Ukey
Signed-off-by: Radha Ramachandran
Signed-off-by: Martin K. Petersen

Viswas G
2020-03-18 01:57:19 +0800
dba2cc03b scsi: pm80xx: sysfs attribute for non fatal dump ... Browse Code »

Added the sysfs attribute for non fatal log so that management utility can
get the non fatal dump from driver. The non-fatal error is an error
condition or abnormal behavior detected by the host, or detected and
reported by the controller to the host.The non-fatal error does not stop
the controller firmware and enables it to still respond to host requests.
A typical example of a non-fatal error is an I/O timeout or an unusual
error notification from the controller. Since the firmware is operational,
the error dump information is pushed to host memory (by firmware) upon
request from the host.

Link: https://lore.kernel.org/r/20200316074906.9119-6-deepak.ukey@microchip.com
Acked-by: Jack Wang
Signed-off-by: Deepak Ukey
Signed-off-by: Viswas G
Signed-off-by: Radha Ramachandran
Signed-off-by: Martin K. Petersen

Deepak Ukey
2020-03-18 01:57:18 +0800
b40f28820 scsi: pm80xx: Cleanup initialization loading fail path ... Browse Code »

1) Move the instance tracking down after we think the instance is good to
go. Avoids having a use-after free.

2) There are goto targets for trying to cleanup if the hw fails to
initialize, but there's some overlap depending on who thinks they own
the sub-structures.

Link: https://lore.kernel.org/r/20200316074906.9119-5-deepak.ukey@microchip.com
Acked-by: Jack Wang
Signed-off-by: Peter Chang
Signed-off-by: Deepak Ukey
Signed-off-by: Viswas G
Signed-off-by: Radha Ramachandran
Signed-off-by: Martin K. Petersen

Peter Chang
2020-03-18 01:57:16 +0800
9d9c7c20f scsi: pm80xx: Free the tag when mpi_set_phy_profile_resp is received ... Browse Code »

In pm80xx driver, the command mpi_set_phy_profile_req is sent by host
during boot to configure the phy profile such as analog setting page, rate
control page. However, the tag is not freed when its response is
received. As a result, 16 tags are missing for each HBA after boot. When
NCQ is enabled with queue depth 16, it needs at least, 15 * 16 = 240 tags
for each HBA to achieve the best performance. In current pm80xx driver with
setting CCB_MAX = 256, the total number of tags in each HBA is 255 for data
IO. Hence, without returning those tags to the pool after boot, some device
will finally be forced to non-ncq mode by ATA layer due to excessive errors
(i.e. LLDD cannot allocate tag for queued task).

Link: https://lore.kernel.org/r/20200316074906.9119-4-deepak.ukey@microchip.com
Acked-by: Jack Wang
Signed-off-by: yuuzheng
Signed-off-by: Deepak Ukey
Signed-off-by: Viswas G
Signed-off-by: Radha Ramachandran
Signed-off-by: Martin K. Petersen

yuuzheng
2020-03-18 01:57:16 +0800
d384be6ed scsi: pm80xx: Deal with kexec reboots ... Browse Code »

A kexec reboot causes the controller fw to assert. This assertion shows up
in two ways, the controller doesn't show up as ready and an interrupt is
waiting as soon as the handler is registered. To resolve this added below
fix:

- Split the interrupt handling setup into two parts, setup and request.

- If the controller ready register indicates not-ready, but that the not
readiness is only on the IOC units we can still try a reset to bring the
system back to the pre-reboot state.

Link: https://lore.kernel.org/r/20200316074906.9119-3-deepak.ukey@microchip.com
Acked-by: Jack Wang
Signed-off-by: Vikram Auradkar
Signed-off-by: Deepak Ukey
Signed-off-by: Viswas G
Signed-off-by: Radha Ramachandran
Signed-off-by: Martin K. Petersen

Vikram Auradkar
2020-03-18 01:57:15 +0800
58bf14c17 scsi: pm80xx: Increase request sg length ... Browse Code »

Increasing the per-request size maximum (max_sectors_kb) runs into the
per-device DMA scatter gather list limit (max_segments) for users of the io
vector system calls (eg, readv and writev). This is because the kernel
combines io vectors into DMA segments when possible, but it doesn't work
for our user because the vectors in the buffer cache get scrambled. This
change bumps the advertised max scatter gather length to 528 to cover 2M w/
x86's 4k pages and some extra for the user checksum. It trims the size of
some of the tables we don't care about and exposes all of the command slots
upstream to the SCSI layer. Also reduced the PM8001_MAX_CCB to 256 as
pm8001 driver has memory limit depend on machine capability. If we increase
the sg length, we need to trade-off it by decreasing PM8001_MAX_CCB.
PM8001_MAX_CCB = 256 does not have any influence on normal use

Link: https://lore.kernel.org/r/20200316074906.9119-2-deepak.ukey@microchip.com
Reported-by: kbuild test robot
Acked-by: Jack Wang
Signed-off-by: Peter Chang
Signed-off-by: Deepak Ukey
Signed-off-by: Viswas G
Signed-off-by: Radha Ramachandran
Signed-off-by: Martin K. Petersen

Peter Chang
2020-03-18 01:57:14 +0800