12 Feb, 2019
1 commit
-
errata:
When a read command returns less data than specified in the PRDs (for
example, there are two PRDs for this command, but the device returns a
number of bytes which is less than in the first PRD), the second PRD of
this command is not read out of the PRD FIFO, causing the next command
to use this PRD erroneously.workaround
- forces sg_tablesize = 1
- modified the sg_io function in block/scsi_ioctl.c to use a 64k buffer
allocated with dma_alloc_coherent during the probe in ahci_imx
- In order to fix the scsi/sata hang, when CD_ROM and HDD are
accessed simultaneously after the workaround is applied.
Do not go to sleep in scsi_eh_handler, when there is host failed.Signed-off-by: Richard Zhu
26 Jan, 2019
3 commits
-
[ Upstream commit c7a082e4242fd8cd21a441071e622f87c16bdacc ]
UBSAN reported those with MegaRAID SAS-3 3108,
[ 77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
[ 77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
[ 77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1
[ 77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[ 77.495791] Workqueue: events work_for_cpu_fn
[ 77.500154] Call trace:
[ 77.502610] dump_backtrace+0x0/0x2c8
[ 77.506279] show_stack+0x24/0x30
[ 77.509604] dump_stack+0x118/0x19c
[ 77.513098] ubsan_epilogue+0x14/0x60
[ 77.516765] __ubsan_handle_out_of_bounds+0xfc/0x13c
[ 77.521767] mr_update_load_balance_params+0x150/0x158 [megaraid_sas]
[ 77.528230] MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas]
[ 77.533825] megasas_get_map_info+0x244/0x2f0 [megaraid_sas]
[ 77.539505] megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas]
[ 77.545794] megasas_init_fw+0x1ab4/0x3518 [megaraid_sas]
[ 77.551212] megasas_probe_one+0x2c4/0xbe0 [megaraid_sas]
[ 77.556614] local_pci_probe+0x7c/0xf0
[ 77.560365] work_for_cpu_fn+0x34/0x50
[ 77.564118] process_one_work+0x61c/0xf08
[ 77.568129] worker_thread+0x534/0xa70
[ 77.571882] kthread+0x1c8/0x1d0
[ 77.575114] ret_from_fork+0x10/0x1c[ 89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
[ 89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
[ 89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1
[ 89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[ 89.268903] Workqueue: events_unbound async_run_entry_fn
[ 89.274222] Call trace:
[ 89.276680] dump_backtrace+0x0/0x2c8
[ 89.280348] show_stack+0x24/0x30
[ 89.283671] dump_stack+0x118/0x19c
[ 89.287167] ubsan_epilogue+0x14/0x60
[ 89.290835] __ubsan_handle_out_of_bounds+0xfc/0x13c
[ 89.295828] MR_LdRaidGet+0x50/0x58 [megaraid_sas]
[ 89.300638] megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas]
[ 89.306576] megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas]
[ 89.313468] megasas_queue_command+0x398/0x3d0 [megaraid_sas]
[ 89.319222] scsi_dispatch_cmd+0x1dc/0x8a8
[ 89.323321] scsi_request_fn+0x8e8/0xdd0
[ 89.327249] __blk_run_queue+0xc4/0x158
[ 89.331090] blk_execute_rq_nowait+0xf4/0x158
[ 89.335449] blk_execute_rq+0xdc/0x158
[ 89.339202] __scsi_execute+0x130/0x258
[ 89.343041] scsi_probe_and_add_lun+0x2fc/0x1488
[ 89.347661] __scsi_scan_target+0x1cc/0x8c8
[ 89.351848] scsi_scan_channel.part.3+0x8c/0xc0
[ 89.356382] scsi_scan_host_selected+0x130/0x1f0
[ 89.361002] do_scsi_scan_host+0xd8/0xf0
[ 89.364927] do_scan_async+0x9c/0x320
[ 89.368594] async_run_entry_fn+0x138/0x420
[ 89.372780] process_one_work+0x61c/0xf08
[ 89.376793] worker_thread+0x13c/0xa70
[ 89.380546] kthread+0x1c8/0x1d0
[ 89.383778] ret_from_fork+0x10/0x1cThis is because when populating Driver Map using firmware raid map, all
non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.From drivers/scsi/megaraid/megaraid_sas_base.c ,
memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);From drivers/scsi/megaraid/megaraid_sas_fp.c ,
/* For non existing VDs, iterate to next VD*/
if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1))
continue;However, there are a few places that failed to skip those non-existing VDs
due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff,
map) and triggered the out-of-bound accesses.Fixes: 51087a8617fe ("megaraid_sas : Extended VD support")
Signed-off-by: Qian Cai
Acked-by: Sumit Saxena
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin -
[ Upstream commit e57b2945aa654e48f85a41e8917793c64ecb9de8 ]
We must free all irqs during shutdown, else kexec's 2nd kernel would hang
in pqi_wait_for_completion_io() as below:Call trace:
pqi_wait_for_completion_io
pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi]
pqi_configure_events+0xec/0x1f8 [smartpqi]
pqi_ctrl_init+0x814/0xca0 [smartpqi]
pqi_pci_probe+0x400/0x46c [smartpqi]
local_pci_probe+0x48/0xb0
pci_device_probe+0x14c/0x1b0
really_probe+0x218/0x3fc
driver_probe_device+0x70/0x140
__driver_attach+0x11c/0x134
bus_for_each_dev+0x70/0xc8
driver_attach+0x30/0x38
bus_add_driver+0x1f0/0x294
driver_register+0x74/0x12c
__pci_register_driver+0x64/0x70
pqi_init+0xd0/0x10000 [smartpqi]
do_one_initcall+0x60/0x1d8
do_init_module+0x64/0x1f8
load_module+0x10ec/0x1350
__se_sys_finit_module+0xd4/0x100
__arm64_sys_finit_module+0x28/0x34
el0_svc_handler+0x104/0x160
el0_svc+0x8/0xcThis happens only in the following combinations:
1. smartpqi is built as module, not built-in;
2. We have a disk connected to smartpqi card;
3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point.Signed-off-by: Yanjiang Jin
Acked-by: Don Brace
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin -
[ Upstream commit 2ba55c9851d74eb015a554ef69ddf2ef061d5780 ]
Problem:
The Linux kernel takes a logical volume offline after a LUN reset. This is
generally accompanied by this message in the dmesg output:Device offlined - not ready after error recovery
Root Cause:
The root cause is a "quirk" in the timeout handling in the Linux SCSI
layer. The Linux kernel places a 30-second timeout on most media access
commands (reads and writes) that it send to device drivers. When a media
access command times out, the Linux kernel goes into error recovery mode
for the LUN that was the target of the command that timed out. Every
command that timed out is kept on a list inside of the Linux kernel to be
retried later. The kernel attempts to recover the command(s) that timed out
by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
TEST UNIT READY commands are successful, the kernel retries the command(s)
that timed out.Each SCSI command issued by the kernel has a result field associated with
it. This field indicates the final result of the command (success or
error). When a command times out, the kernel places a value in this result
field indicating that the command timed out.The "quirk" is that after the LUN reset and TEST UNIT READY commands are
completed, the kernel checks each command on the timed-out command list
before retrying it. If the result field is still "timed out", the kernel
treats that command as not having been successfully recovered for a
retry. If the number of commands that are in this state are greater than
two, the kernel takes the LUN offline.Fix:
When our RAIDStack receives a LUN reset, it simply waits until all
outstanding commands complete. Generally, all of these outstanding commands
complete successfully. Therefore, the fix in the smartpqi driver is to
always set the command result field to indicate success when a request
completes successfully. This normally isn’t necessary because the result
field is always initialized to success when the command is submitted to the
driver. So when the command completes successfully, the result field is
left untouched. But in this case, the kernel changes the result field
behind the driver’s back and then expects the field to be changed by the
driver as the commands that timed-out complete.Reviewed-by: Dave Carroll
Reviewed-by: Scott Teel
Signed-off-by: Kevin Barnett
Signed-off-by: Don Brace
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
23 Jan, 2019
2 commits
-
commit 44759979a49bfd2d20d789add7fa81a21eb1a4ab upstream.
Changing of caching mode via /sys/devices/.../scsi_disk/.../cache_type may
fail if device responds to MODE SENSE command with DPOFUA flag set, and
then checks this flag to be not set on MODE SELECT command.In this scenario, when trying to change cache_type, write always fails:
# echo "none" >cache_type
bash: echo: write error: Invalid argumentAnd following appears in dmesg:
[13007.865745] sd 1:0:1:0: [sda] Sense Key : Illegal Request [current]
[13007.865753] sd 1:0:1:0: [sda] Add. Sense: Invalid field in parameter listFrom SBC-4 r15, 6.5.1 "Mode pages overview", description of DEVICE-SPECIFIC
PARAMETER field in the mode parameter header:
...
The write protect (WP) bit for mode data sent with a MODE SELECT
command shall be ignored by the device server.
...
The DPOFUA bit is reserved for mode data sent with a MODE SELECT
command.
...The remaining bits in the DEVICE-SPECIFIC PARAMETER byte are also reserved
and shall be set to zero.[mkp: shuffled commentary to commit description]
Cc: stable@vger.kernel.org
Signed-off-by: Ivan Mironov
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 3f7e62bba0003f9c68f599f5997c4647ef5b4f4e upstream.
The commit 356fd2663cff ("scsi: Set request queue runtime PM status back to
active on resume") fixed up the inconsistent RPM status between request
queue and device. However changing request queue RPM status shall be done
only on successful resume, otherwise status may be still inconsistent as
below,Request queue: RPM_ACTIVE
Device: RPM_SUSPENDEDThis ends up soft lockup because requests can be submitted to underlying
devices but those devices and their required resource are not resumed.For example,
After above inconsistent status happens, IO request can be submitted to UFS
device driver but required resource (like clock) is not resumed yet thus
lead to warning as below call stack,WARN_ON(hba->clk_gating.state != CLKS_ON);
ufshcd_queuecommand
scsi_dispatch_cmd
scsi_request_fn
__blk_run_queue
cfq_insert_request
__elv_add_request
blk_flush_plug_list
blk_finish_plug
jbd2_journal_commit_transaction
kjournald2We may see all behind IO requests hang because of no response from storage
host or device and then soft lockup happens in system. In the end, system
may crash in many ways.Fixes: 356fd2663cff (scsi: Set request queue runtime PM status back to active on resume)
Cc: stable@vger.kernel.org
Signed-off-by: Stanley Chu
Reviewed-by: Bart Van Assche
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
13 Jan, 2019
2 commits
-
commit 4e87eb2f46ea547d12a276b2e696ab934d16cfb6 upstream.
Certain older adapters such as the OneConnect OCe10100 may not have a valid
wqpcnt value. In this case, do not set queue->page_count to 0 in
lpfc_sli4_queue_alloc() as this will prevent the driver from initializing.Fixes: 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Ewan D. Milne
Reviewed-by: Laurence Oberman
Tested-by: Laurence Oberman
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9ae4f8420ed7be4b13c96600e3568c144d101a23 ]
If "interface" is NULL then we can't release it and trying to will only
lead to an Oops.Fixes: aea71a024914 ("[SCSI] bnx2fc: Introduce interface structure for each vlan interface")
Signed-off-by: Dan Carpenter
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
29 Dec, 2018
1 commit
-
commit 61cce6f6eeced5ddd9cac55e807fe28b4f18c1ba upstream.
When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in sd. If we fail allocating the special page, we return
busy, and it'll get retried. But since ordering is honored for dispatch
requests, we can keep retrying this same IO and failing. Behind that IO
could be requests that want to free memory, but they never get the
chance. This means you get repeated spews of traces like this:[1201401.625972] Call Trace:
[1201401.631748] dump_stack+0x4d/0x65
[1201401.639445] warn_alloc+0xec/0x190
[1201401.647335] __alloc_pages_slowpath+0xe84/0xf30
[1201401.657722] ? get_page_from_freelist+0x11b/0xb10
[1201401.668475] ? __alloc_pages_slowpath+0x2e/0xf30
[1201401.679054] __alloc_pages_nodemask+0x1f9/0x210
[1201401.689424] alloc_pages_current+0x8c/0x110
[1201401.699025] sd_setup_write_same16_cmnd+0x51/0x150
[1201401.709987] sd_init_command+0x49c/0xb70
[1201401.719029] scsi_setup_cmnd+0x9c/0x160
[1201401.727877] scsi_queue_rq+0x4d9/0x610
[1201401.736535] blk_mq_dispatch_rq_list+0x19a/0x360
[1201401.747113] blk_mq_sched_dispatch_requests+0xff/0x190
[1201401.758844] __blk_mq_run_hw_queue+0x95/0xa0
[1201401.768653] blk_mq_run_work_fn+0x2c/0x30
[1201401.777886] process_one_work+0x14b/0x400
[1201401.787119] worker_thread+0x4b/0x470
[1201401.795586] kthread+0x110/0x150
[1201401.803089] ? rescuer_thread+0x320/0x320
[1201401.812322] ? kthread_park+0x90/0x90
[1201401.820787] ? do_syscall_64+0x53/0x150
[1201401.829635] ret_from_fork+0x29/0x40Ensure that the discard page allocation has a mempool backing, so we
know we can make progress.Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
21 Dec, 2018
2 commits
-
[ Upstream commit 02f425f811cefcc4d325d7a72272651e622dc97e ]
Currently pvscsi_remove calls free_irq more than once as
pvscsi_release_resources and __pvscsi_shutdown both call
pvscsi_shutdown_intr. This results in a 'Trying to free already-free IRQ'
warning and stack trace. To solve the problem pvscsi_shutdown_intr has been
moved out of pvscsi_release_resources.Signed-off-by: Cathy Avery
Reviewed-by: Ewan D. Milne
Reviewed-by: Dan Carpenter
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin -
[ Upstream commit 5db6dd14b31397e8cccaaddab2ff44ebec1acf25 ]
This commit addresses NULL pointer dereference in iscsi_eh_session_reset.
Reference should not be made to session->leadconn when session->state is
set to ISCSI_STATE_TERMINATE.Signed-off-by: Fred Herard
Reviewed-by: Konrad Rzeszutek Wilk
Reviewed-by: Lee Duncan
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
08 Dec, 2018
2 commits
-
commit 81df022b688d43d2a3667518b2f755d384397910 upstream.
Cleanly fill memory for "vendor" and "model" with 0-bytes for the
"compatible" case rather than adding only a single 0 byte. This
simplifies the devinfo code a a bit, and avoids mistakes in other places
of the code (not in current upstream, but we had one such mistake in the
SUSE kernel).[mkp: applied by hand and added braces]
Signed-off-by: Martin Wilck
Reviewed-by: Bart Van Assche
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 8c5a50e8e7ad812a62f7ccf28d9a5e74fddf3000 upstream.
The bfa driver has a number of real issues with string termination
that gcc-8 now points out:drivers/scsi/bfa/bfad_bsg.c: In function 'bfad_iocmd_port_get_attr':
drivers/scsi/bfa/bfad_bsg.c:320:9: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init':
drivers/scsi/bfa/bfa_fcs.c:775:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:781:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:788:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:801:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:808:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init':
drivers/scsi/bfa/bfa_fcs.c:837:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:844:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:852:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init':
drivers/scsi/bfa/bfa_fcs.c:778:2: error: 'strncat' output may be truncated copying 10 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:784:2: error: 'strncat' output may be truncated copying 30 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:803:3: error: 'strncat' output may be truncated copying 44 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:811:3: error: 'strncat' output may be truncated copying 16 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init':
drivers/scsi/bfa/bfa_fcs.c:840:2: error: 'strncat' output may be truncated copying 10 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:847:2: error: 'strncat' output may be truncated copying 30 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_hbaattr':
drivers/scsi/bfa/bfa_fcs_lport.c:2657:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs_lport.c:2659:11: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ms_gmal_response':
drivers/scsi/bfa/bfa_fcs_lport.c:3232:5: error: 'strncpy' output may be truncated copying 16 bytes from a string of length 247 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_send_rspn_id':
drivers/scsi/bfa/bfa_fcs_lport.c:4670:3: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c:4682:3: error: 'strncat' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_util_send_rspn_id':
drivers/scsi/bfa/bfa_fcs_lport.c:5206:3: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c:5215:3: error: 'strncat' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_portattr':
drivers/scsi/bfa/bfa_fcs_lport.c:2751:2: error: 'strncpy' specified bound 128 equals destination size [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcbuild.c: In function 'fc_rspnid_build':
drivers/scsi/bfa/bfa_fcbuild.c:1254:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcbuild.c:1253:25: note: length computed here
drivers/scsi/bfa/bfa_fcbuild.c: In function 'fc_rsnn_nn_build':
drivers/scsi/bfa/bfa_fcbuild.c:1275:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]In most cases, this can be addressed by correctly calling strlcpy and
strlcat instead of strncpy/strncat, with the size of the destination
buffer as the last argument.For consistency, I'm changing the other callers of strncpy() in this
driver the same way.Signed-off-by: Arnd Bergmann
Reviewed-by: Johannes Thumshirn
Acked-by: Sudarsana Kalluru
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
21 Nov, 2018
7 commits
-
commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream.
c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
already fixed this race, however the implied synchronize_rcu()
in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
performance regression.Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
tried to quiesce queue for avoiding unnecessary synchronize_rcu()
only when queue initialization is done, because it is usual to see
lots of inexistent LUNs which need to be probed.However, turns out it isn't safe to quiesce queue only when queue
initialization is done. Because when one SCSI command is completed,
the user of sending command can be waken up immediately, then the
scsi device may be removed, meantime the run queue in scsi_end_request()
is still in-progress, so kernel panic can be caused.In Red Hat QE lab, there are several reports about this kind of kernel
panic triggered during kernel booting.This patch tries to address the issue by grabing one queue usage
counter during freeing one request and the following run queue.Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
Cc: Andrew Jones
Cc: Bart Van Assche
Cc: linux-scsi@vger.kernel.org
Cc: Martin K. Petersen
Cc: Christoph Hellwig
Cc: James E.J. Bottomley
Cc: stable
Cc: jianchao.wang
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit f635e48e866ee1a47d2d42ce012fdcc07bf55853 upstream.
This patch initializes port speed so that firmware does not set lower
operating speed. Setting lower speed in firmware impacts WRITE perfomance.Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery")
Cc:
Signed-off-by: Quinn Tran
Signed-off-by: Himanshu Madhani
Tested-by: Laurence Oberman
Reviewed-by: Ewan D. Milne
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 5c6400536481d9ef44ef94e7bf2c7b8e81534db7 upstream.
This patch fixes issue where driver clears NPort ID map instead of marking
handle in use. Once driver clears NPort ID from the database, it can reuse
the same NPort ID resulting in a PLOGI failure.[mkp: fixed Himanshu's SoB]
Fixes: a084fd68e1d2 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
Cc:
Signed-of-by: Quinn Tran
Reviewed-by: Ewan D. Milne
Signed-off-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 1e4ac5d6fe0a4af17e4b6251b884485832bf75a3 upstream.
If chip unable to fully initialize, use full shutdown sequence to clear out
any stale FW state.Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Quinn Tran
Signed-off-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 7c388f91ec1a59b0ed815b07b90536e2d57e1e1f upstream.
Remove stale debug trace.
Fixes: 1eb42f965ced ("qla2xxx: Make trace flags more readable")
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Quinn Tran
Signed-off-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit b86ac8fd4b2f6ec2f9ca9194c56eac12d620096f upstream.
This patch improves performance for 16G and above adapter by removing
additional call to process_response_queue().[mkp: typo]
Cc:
Signed-off-by: Quinn Tran
Signed-off-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 4c1458df9635c7e3ced155f594d2e7dfd7254e21 upstream.
Fixes: 6246b8a1d26c7c ("[SCSI] qla2xxx: Enhancements to support ISP83xx.")
Fixes: 1bb395485160d2 ("qla2xxx: Correct iiDMA-update calling conventions.")
Cc:
Signed-off-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
14 Nov, 2018
4 commits
-
[ Upstream commit ca7fb76e091f889cfda1287c07a9358f73832b39 ]
On io completion, the driver is taking an adapter wide lock and nulling the
scsi command back pointer. The nulling of the back pointer is to signify the
io was completed and the scsi_done() routine was called. However, the routine
makes no check to see if the abort routine had done the same thing and
possibly nulled the pointer. Thus it may doubly-complete the io.Make the following mods:
- Check to make sure forward progress (call scsi_done()) only happens if the
command pointer was non-null.- As the taking of the lock, which is adapter wide, is very costly on a system
under load, null the pointer using an xchg operation rather than under lock.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 0ef01a2d95fd62bb4f536e7ce4d5e8e74b97a244 ]
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.Signed-off-by: Dick Kennedy
Signed-off-by: James Smart
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 47db7873136a9c57c45390a53b57019cf73c8259 ]
In megasas_mgmt_compat_ioctl_fw(), to handle the structure
compat_megasas_iocpacket 'cioc', a user-space structure megasas_iocpacket
'ioc' is allocated before megasas_mgmt_ioctl_fw() is invoked to handle
the packet. Since the two data structures have different fields, the data
is copied from 'cioc' to 'ioc' field by field. In the copy process,
'sense_ptr' is prepared if the field 'sense_len' is not null, because it
will be used in megasas_mgmt_ioctl_fw(). To prepare 'sense_ptr', the
user-space data 'ioc->sense_off' and 'cioc->sense_off' are copied and
saved to kernel-space variables 'local_sense_off' and 'user_sense_off'
respectively. Given that 'ioc->sense_off' is also copied from
'cioc->sense_off', 'local_sense_off' and 'user_sense_off' should have the
same value. However, 'cioc' is in the user space and a malicious user can
race to change the value of 'cioc->sense_off' after it is copied to
'ioc->sense_off' but before it is copied to 'user_sense_off'. By doing
so, the attacker can inject different values into 'local_sense_off' and
'user_sense_off'. This can cause undefined behavior in the following
execution, because the two variables are supposed to be same.This patch enforces a check on the two kernel variables 'local_sense_off'
and 'user_sense_off' to make sure they are the same after the copy. In
case they are not, an error code EINVAL will be returned.Signed-off-by: Wenwen Wang
Acked-by: Sumit Saxena
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit fd47d919d0c336e7c22862b51ee94927ffea227a ]
If a target disconnects during a PIO data transfer the command may fail
when the target reconnects:scsi host1: DMA length is zero!
scsi host1: cur adr[04380000] len[00000000]The scsi bus is then reset. This happens because the residual reached
zero before the transfer was completed.The usual residual calculation relies on the Transfer Count registers.
That works for DMA transfers but not for PIO transfers. Fix the problem
by storing the PIO transfer residual and using that to correctly
calculate bytes_sent.Fixes: 6fe07aaffbf0 ("[SCSI] m68k: new mac_esp scsi driver")
Tested-by: Stan Johnson
Signed-off-by: Finn Thain
Tested-by: Michael Schmitz
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
04 Nov, 2018
4 commits
-
[ Upstream commit 597d74005ba85e87c256cd732128ebf7faf54247 ]
The USB storage glue sets the try_rc_10_first flag in an attempt to
avoid wedging poorly implemented legacy USB devices.If the device capacity is too large to be expressed in the provided
response buffer field of READ CAPACITY(10), a well-behaved device will
set the reported capacity to 0xFFFFFFFF. We will then attempt to issue a
READ CAPACITY(16) to obtain the real capacity.Since this part of the discovery logic is not covered by the first_scan
flag, a warning will be printed a couple of times times per revalidate
attempt if we upgrade from READ CAPACITY(10) to READ CAPACITY(16).Remember that we have successfully issued READ CAPACITY(16) so we can
take the fast path on subsequent revalidate attempts.Reported-by: Menion
Reviewed-by: Laurence Oberman
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin -
[ Upstream commit 09dd15e0d9547ca424de4043bcd429bab6f285c8 ]
Following an RSCN, ibmvfc will issue an ADISC to determine if the
underlying target has changed, comparing the SCSI ID, WWPN, and WWNN to
determine how to handle the rport in discovery. However, the comparison
of the WWPN and WWNN was performing a memcmp between a big endian field
against a CPU endian field, which resulted in the wrong answer on LE
systems. This was observed as unexpected errors getting logged at boot
time as targets were getting relogins when not needed.Signed-off-by: Brian King
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin -
[ Upstream commit 3a9910d7b686546dcc9986e790af17e148f1c888 ]
qla2x00_tmf_sp_done() now deletes the timer that will run
qla2x00_tmf_iocb_timeout(), but doesn't check whether the timer already
expired. Check the return value from del_timer() to avoid calling
complete() a second time.Fixes: 4440e46d5db7 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous ...")
Fixes: 1514839b3664 ("scsi: qla2xxx: Fix NULL pointer crash due to active ...")
Signed-off-by: Ben Hutchings
Acked-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin -
[ Upstream commit d18539754d97876503275efc7d00a1901bb0cfad ]
As reported by Meelis Roos, my previous patch causes an incorrect
calculation of the timeout, through an undefined signed integer
overflow:[ 12.228155] UBSAN: Undefined behaviour in drivers/scsi/aacraid/commsup.c:2514:49
[ 12.228229] signed integer overflow:
[ 12.228283] 964297611 * 250 cannot be represented in type 'long int'The problem is that doing a multiplication with HZ first and then
dividing by USEC_PER_SEC worked correctly for 32-bit microseconds,
but not for 32-bit nanoseconds, which would require up to 41 bits.This reworks the calculation to first convert the nanoseconds into
jiffies, which should give us the same result as before and not overflow.Unfortunately I did not understand the exact intention of the algorithm,
in particular the part where we add half a second, so it's possible that
there is still a preexisting problem in this function. I added a comment
that this would be handled more nicely using usleep_range(), which
generally works better for waking up at a particular time than the
current schedule_timeout() based implementation. I did not feel
comfortable trying to implement that without being sure what the
intent is here though.Fixes: 820f18865912 ("scsi: aacraid: use timespec64 instead of timeval")
Tested-by: Meelis Roos
Signed-off-by: Arnd Bergmann
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
20 Oct, 2018
4 commits
-
[ Upstream commit f1f1fadacaf08b7cf11714c0c29f8fa4d4ef68a9 ]
When sd_init_command() get's a command with a unknown req_op() it crashes the
system via BUG().This makes debugging the actual reason for the broken request cmd_flags pretty
hard as the system is down before it's able to write out debugging data on the
serial console or the trace buffer.Change the BUG() to a WARN_ON() and return BLKPREP_KILL to fail gracefully and
return an I/O error to the producer of the request.Signed-off-by: Johannes Thumshirn
Cc: Hannes Reinecke
Cc: Bart Van Assche
Cc: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 318ddb34b2052f838aa243d07173e2badf3e630e ]
While dlpar adding primary ipr adapter back, driver goes through adapter
initialization then schedule ipr_worker_thread to start te disk scan by
dropping the host lock, calling scsi_add_device. Then get the adapter reset
request again, so driver does scsi_block_requests, this will cause the
scsi_add_device get hung until we unblock. But we can't run ipr_worker_thread
to do the unblock because its stuck in scsi_add_device.This patch fixes the issue.
[mkp: typo and whitespace fixes]
Signed-off-by: Wen Xiong
Acked-by: Brian King
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit adad633af7b970bfa5dd1b624a4afc83cac9b235 ]
While reviewing another part of the code, Kees noticed that the strncpy of the
partition name might not always be NUL terminated. Switch to using strscpy
which does this safely.Reported-by: Kees Cook
Signed-off-by: Laura Abbott
Reviewed-by: Kees Cook
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit d792d4c4fc866ae224b0b0ca2aabd87d23b4d6cc ]
There's currently a warning about string overflow with strncat:
drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c: In function 'ibmvscsis_probe':
drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c:3479:2: error: 'strncat' specified
bound 64 equals destination size [-Werror=stringop-overflow=]
strncat(vscsi->eye, vdev->name, MAX_EYE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Switch to a single snprintf instead of a strcpy + strcat to handle this
cleanly.Signed-off-by: Laura Abbott
Suggested-by: Kees Cook
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
18 Oct, 2018
1 commit
-
[ Upstream commit cbe3fd39d223f14b1c60c80fe9347a3dd08c2edb ]
We should first do the le16_to_cpu endian conversion and then apply the
FCP_CMD_LENGTH_MASK mask.Fixes: 5f35509db179 ("qla2xxx: Terminate exchange if corrupted")
Signed-off-by: Dan Carpenter
Acked-by: Quinn Tran
Acked-by: Himanshu Madhani
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
10 Oct, 2018
2 commits
-
[ Upstream commit c77a2fa3ff8f73d1a485e67e6f81c64823739d59 ]
The QED driver commit, 1ac4329a1cff ("qed: Add configuration information
to register dump and debug data"), removes the CRC length validation
causing nvm_get_image failure while loading qedi driver:[qed_mcp_get_nvm_image:2700(host_10-0)]Image [0] is too big - 00006008 bytes
where only 00006004 are available
[qedi_get_boot_info:2253]:10: Could not get NVM image. ret = -12Hence add and adjust the CRC size to iSCSI NVM image to read boot info at
qedi load time.Signed-off-by: Nilesh Javali
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 89809b028b6f54187b7d81a0c69b35d394c52e62 ]
Reported-by: Colin Ian King
Signed-off-by: Varun Prakash
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
3 commits
-
[ Upstream commit c3b10a55abc943a526aaecd7e860b15671beb906 ]
There is a possibility that firmware on the controller was upgraded before
system was suspended. During resume, driver needs to read updated
controller properties.Signed-off-by: Shivasharan S
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit aa154ea885eb0c2407457ce9c1538d78c95456fa ]
When ioremap_nocache fails, the lack of error-handling code may cause
unexpected results.This patch adds error-handling code after calling ioremap_nocache.
Signed-off-by: Zhouyang Jia
Reviewed-by: Johannes Thumshirn
Acked-by: Manish Rangankar
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1262dc09dc9ae7bf4ad00b6a2c5ed6a6936bcd10 ]
Currently an open firmware property is copied into partition_name variable
without keeping a room for \0.Later one, this variable (partition_name), which is 97 bytes long, is
strncpyed into ibmvcsci_host_data->madapter_info->partition_name, which is
96 bytes long, possibly truncating it 'again' and removing the \0.This patch simply decreases the partition name to 96 and just copy using
strlcpy() which guarantees that the string is \0 terminated. I think there
is no issue if this there is a truncation in this very first copy, i.e,
when the open firmware property is read and copied into the driver for the
very first time;This issue also causes the following warning on GCC 8:
drivers/scsi/ibmvscsi/ibmvscsi.c:281:2: warning: strncpy output may be truncated copying 96 bytes from a string of length 96 [-Wstringop-truncation]
...
inlined from ibmvscsi_probe at drivers/scsi/ibmvscsi/ibmvscsi.c:2221:7:
drivers/scsi/ibmvscsi/ibmvscsi.c:265:3: warning: strncpy specified bound 97 equals destination size [-Wstringop-truncation]CC: Bart Van Assche
CC: Tyrel Datwyler
Signed-off-by: Breno Leitao
Acked-by: Tyrel Datwyler
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
26 Sep, 2018
1 commit
-
[ Upstream commit fa519f701d27198a2858bb108fc18ea9d8c106a7 ]
fc_rport_login() will be calling mutex_lock() while running inside an
RCU-protected section, triggering the warning 'sleeping function called
from invalid context'. To fix this we can drop the rcu functions here
altogether as the disc mutex protecting the list itself is already held,
preventing any list manipulation.Fixes: a407c593398c ("scsi: libfc: Fixup disc_mutex handling")
Signed-off-by: Hannes Reinecke
Acked-by: Johannes Thumshirn
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
20 Sep, 2018
1 commit
-
[ Upstream commit 4dc98c1995482262e70e83ef029135247fafe0f2 ]
tw_probe() returns 0 in case of fail of tw_initialize_device_extension(),
pci_resource_start() or tw_reset_sequence() and releases resources.
twl_probe() returns 0 in case of fail of twl_initialize_device_extension(),
pci_iomap() and twl_reset_sequence(). twa_probe() returns 0 in case of
fail of tw_initialize_device_extension(), ioremap() and
twa_reset_sequence().The patch adds retval initialization for these cases.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev
Acked-by: Adam Radford
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman