18 Oct, 2018
1 commit
-
commit 24abf2901b18bf941b9f21ea2ce5791f61097ae4 upstream.
We have two nested loops to check the entries within the pfn_array_table
arrays. But we mistakenly use the outer array as an index in our check,
and completely ignore the indexing performed by the inner loop.Cc: stable@vger.kernel.org
Signed-off-by: Eric Farman
Message-Id:
Signed-off-by: Cornelia Huck
Signed-off-by: Greg Kroah-Hartman
10 Oct, 2018
2 commits
-
[ Upstream commit 0ac1487c4b2de383b91ecad1be561b8f7a2c15f4 ]
For inbound data with an unsupported HW header format, only dump the
actual HW header. We have no idea how much payload follows it, and what
it contains. Worst case, we dump past the end of the Inbound Buffer and
access whatever is located next in memory.Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit aec45e857c5538664edb76a60dd452e3265f37d1 ]
qeth_query_oat_command() currently allocates the kernel buffer for
the SIOC_QETH_QUERY_OAT ioctl with kzalloc. So on systems with
fragmented memory, large allocations may fail (eg. the qethqoat tool by
default uses 132KB).Solve this issue by using vzalloc, backing the allocation with
non-contiguous memory.Signed-off-by: Wenjia Zhang
Reviewed-by: Julian Wiedmann
Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
2 commits
-
[ Upstream commit d642d6262f4fcfa5d200ec6e218c17f0c15b3390 ]
The numa_node field of the tag_set struct has to be explicitly
initialized, otherwise it stays as 0, which is a valid numa node id and
cause memory allocation failure if node 0 is offline.Acked-by: Sebastian Ott
Signed-off-by: Vasily Gorbik
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit b17e3abb0af404cb62ad4ef1a5962f58b06e2b78 ]
The numa_node field of the tag_set struct has to be explicitly
initialized, otherwise it stays as 0, which is a valid numa node id and
cause memory allocation failure if node 0 is offline.Acked-by: Stefan Haberland
Signed-off-by: Vasily Gorbik
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
26 Sep, 2018
2 commits
-
[ Upstream commit 70551dc46ffa3555a0b5f3545b0cd87ab67fd002 ]
After the subdriver's remove() routine has completed, the card's layer
mode is undetermined again. Reflect this in the layer2 field.If qeth_dev_layer2_store() hits an error after remove() was called, the
card _always_ requires a setup(), even if the previous layer mode is
requested again.
But qeth_dev_layer2_store() bails out early if the requested layer mode
still matches the current one. So unless we reset the layer2 field,
re-probing the card back to its previous mode is currently not possible.Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a702349a4099cd5a7bab0904689d8e0bf8dcd622 ]
By updating q->used_buffers only _after_ do_QDIO() has completed, there
is a potential race against the buffer's TX completion. In the unlikely
case that the TX completion path wins, qeth_qdio_output_handler() would
decrement the counter before qeth_flush_buffers() even incremented it.Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
15 Sep, 2018
2 commits
-
[ Upstream commit 7c6553d4db03350dad0110c3224194c19df76a8f ]
Fix a panic that occurs for a device that got an error in
dasd_eckd_check_characteristics() during online processing.
For example the read configuration data command may have failed.If this error occurs the device is not being set online and the earlier
invoked steps during online processing are rolled back. Therefore
dasd_eckd_uncheck_device() is called which needs a valid private
structure. But this pointer is not valid if
dasd_eckd_check_characteristics() has failed.Check for a valid device->private pointer to prevent a panic.
Reviewed-by: Jan Hoeppner
Signed-off-by: Stefan Haberland
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 669f3765b755fd8739ab46ce3a9c6292ce8b3d2a ]
During offline processing two worker threads are canceled without
freeing the device reference which leads to a hanging offline process.Reviewed-by: Jan Hoeppner
Signed-off-by: Stefan Haberland
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
05 Sep, 2018
1 commit
-
commit 64e03ff72623b8c2ea89ca3cb660094e019ed4ae upstream.
When allocating a new AOB fails, handle_outbound() is still capable of
transmitting the selected buffer (just without async completion).But if a previous transfer on this queue slot used async completion, its
sbal_state flags field is still set to QDIO_OUTBUF_STATE_FLAG_PENDING.
So when the upper layer driver sees this stale flag, it expects an async
completion that never happens.Fix this by unconditionally clearing the flags field.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: #v3.2+
Signed-off-by: Julian Wiedmann
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman
24 Aug, 2018
1 commit
-
[ Upstream commit 2c861d89ccda2fbcea9358eff9cc5f8fae548be5 ]
If the device has not been registered, or there is work pending,
we should reschedule a sch_event call again.Signed-off-by: Dong Jia Shi
Message-Id:
Reviewed-by: Cornelia Huck
Signed-off-by: Cornelia Huck
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
03 Aug, 2018
1 commit
-
[ Upstream commit 9e156c54ace310ce7fb1cd960e62416947f3d47c ]
Otherwise iterating with list_for_each() over the adapter->erp_ready_head
and adapter->erp_running_head lists can lead to an infinite loop. See commit
"zfcp: fix infinite iteration on erp_ready_head list".The run-time check is only performed for debug kernels which have the kernel
lock validator enabled. Following is an example of the warning that is
reported, if the ERP lock is not held when calling zfcp_dbf_rec_trig():WARNING: CPU: 0 PID: 604 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x172/0x188
Modules linked in: ...
CPU: 0 PID: 604 Comm: kworker/u128:3 Not tainted 4.16.0-... #1
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Workqueue: zfcp_q_0.0.1906 zfcp_scsi_rport_work
Krnl PSW : 00000000330fdbf9 00000000367e9728 (zfcp_dbf_rec_trig+0x172/0x188)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
Krnl GPRS: 00000000c57a5d99 3288200000000000 0000000000000000 000000006cc82740
00000000009d09d6 0000000000000000 00000000000000ff 0000000000000000
0000000000000000 0000000000e1b5fe 000000006de01d38 0000000076130958
000000006cc82548 000000006de01a98 00000000009d09d6 000000006a6d3c80
Krnl Code: 00000000009d0ad2: eb7ff0b80004 lmg %r7,%r15,184(%r15)
00000000009d0ad8: c0f4000d7dd0 brcl 15,b80678
#00000000009d0ade: a7f40001 brc 15,9d0ae0
>00000000009d0ae2: a7f4ff7d brc 15,9d09dc
00000000009d0ae6: e340f0f00004 lg %r4,240(%r15)
00000000009d0aec: eb7ff0b80004 lmg %r7,%r15,184(%r15)
00000000009d0af2: 07f4 bcr 15,%r4
00000000009d0af4: 0707 bcr 0,%r7
Call Trace:
([] zfcp_dbf_rec_trig+0x66/0x188)
[] zfcp_scsi_rport_work+0x98/0x190
[] process_one_work+0x3d4/0x6f8
[] worker_thread+0x232/0x418
[] kthread+0x166/0x178
[] kernel_thread_starter+0x6/0xc
[] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u128:3/604:
#0: ((wq_completion)name){+.+.}, at: [] process_one_work+0x1dc/0x6f8
#1: ((work_completion)(&port->rport_work)){+.+.}, at: [] process_one_work+0x1dc/0x6f8
Last Breaking-Event-Address:
[] zfcp_dbf_rec_trig+0x16e/0x188
---[ end trace b2f4020572e2c124 ]---Suggested-by: Steffen Maier
Signed-off-by: Jens Remus
Reviewed-by: Benjamin Block
Reviewed-by: Steffen Maier
Signed-off-by: Steffen Maier
Signed-off-by: Martin K. PetersenSigned-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
08 Jul, 2018
1 commit
-
[ Upstream commit f0f59a2fab8e52b9d582b39da39f22230ca80aee ]
Dasd uses completion_data from struct request to store per request
private data - this is problematic since this member is part of a
union which is also used by IO schedulers.
Let the block layer maintain space for per request data behind each
struct request.Fixes crashes on block layer timeouts like this one:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000000001308007 R3:00000000fffc8007 S:00000000fffcc000 P:000000000000013d
Oops: 0004 ilc:2 [#1] PREEMPT SMP
Modules linked in: [...]
CPU: 0 PID: 1480 Comm: kworker/0:2H Not tainted 4.17.0-rc4-00046-gaa3bcd43b5af #203
Hardware name: IBM 3906 M02 702 (LPAR)
Workqueue: kblockd blk_mq_timeout_work
Krnl PSW : 0000000067ac406b 00000000b6960308 (do_raw_spin_trylock+0x30/0x78)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000c00 0000000000000000 0000000000000000 0000000000000001
0000000000b9d3c8 0000000000000000 0000000000000001 00000000cf9639d8
0000000000000000 0700000000000000 0000000000000000 000000000099f09e
0000000000000000 000000000076e9d0 000000006247bb08 000000006247bae0
Krnl Code: 00000000001c159c: b90400c2 lgr %r12,%r2
00000000001c15a0: a7180000 lhi %r1,0
#00000000001c15a4: 583003a4 l %r3,932
>00000000001c15a8: ba132000 cs %r1,%r3,0(%r2)
00000000001c15ac: a7180001 lhi %r1,1
00000000001c15b0: a784000b brc 8,1c15c6
00000000001c15b4: c0e5004e72aa brasl %r14,b8fb08
00000000001c15ba: 1812 lr %r1,%r2
Call Trace:
([] 0x700000000000000)
[] _raw_spin_lock_irqsave+0x7a/0xb8
[] dasd_times_out+0x46/0x278
[] blk_mq_terminate_expired+0x9e/0x108
[] bt_for_each+0x102/0x130
[] blk_mq_queue_tag_busy_iter+0x74/0xd8
[] blk_mq_timeout_work+0x260/0x320
[] process_one_work+0x3bc/0x708
[] worker_thread+0x262/0x408
[] kthread+0x160/0x178
[] kernel_thread_starter+0x6/0xc
[] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[] _raw_spin_lock_irqsave+0x74/0xb8Kernel panic - not syncing: Fatal exception: panic_on_oops
Signed-off-by: Sebastian Ott
Reviewed-by: Stefan Haberland
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
03 Jul, 2018
7 commits
-
commit 6a76550841d412330bd86aed3238d1888ba70f0e upstream.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xc0 ZFCP_ERP_ACTION_NONESigned-off-by: Steffen Maier
Cc: #2.6.38+
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 8c3d20aada70042a39c6a6625be037c1472ca610 upstream.
That other commit introduced an inconsistency because it would trace on
ERP_FAILED for all callers of port forced reopen triggers (not just
terminate_rport_io), but it would not trace on ERP_FAILED for all callers of
other ERP triggers such as adapter, port regular, LUN.Therefore, generalize that other commit. zfcp_erp_action_enqueue() already
had two early outs which re-used the one zfcp_dbf_rec_trig() call. All ERP
trigger functions finally run through zfcp_erp_action_enqueue(). So move
the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into
zfcp_erp_action_enqueue() and add another early out with new trace marker
for pseudo ERP need in this case. This removes all early returns from all
ERP trigger functions so we always end up at zfcp_dbf_rec_trig().Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILEDSigned-off-by: Steffen Maier
Cc: #2.6.38+
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit d70aab55924b44f213fec2b900b095430b33eec6 upstream.
For problem determination we always want to see when we were invoked on the
terminate_rport_io callback whether we perform something or not.Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec:
loose remote port
t workqueue
[s] zfcp_q_ IRQ zfcperp=== ================== =================== ============================
0 recv RSCN
q p.test_link_work
block rport
start fast_io_fail_tmo
send ADISC ELS
4 recv ADISC fail
block zfcp_port
port forced reopen
send open port
12 recv open port fail
q p.gid_pn_work
zfcp_erp_wakeup
(zfcp_erp_wait would return)
GID_PN failBefore this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail,
e.g. with the typical 5 sec setting.port.status |= ERP_FAILED
If fast_io_fail_tmo triggers after this point, we missed a SCSI trace.
workqueue
fc_dl_
==================
27 fc_timeout_fail_rport_io
fc_terminate_rport_io
zfcp_scsi_terminate_rport_io
zfcp_erp_port_forced_reopen
_zfcp_erp_port_forced_reopen
if (port.status & ERP_FAILED)
return;Therefore, write a trace before above early return.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : sctrpi1 SCSI terminate rport I/O
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x
D_ID : 0x
Adapter status : 0x...
Port status : 0x...
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILEDSigned-off-by: Steffen Maier
Cc: #2.6.38+
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 96d9270499471545048ed8a6d7f425a49762283d upstream.
get_device() and its internally used kobject_get() only return NULL if they
get passed NULL as argument. zfcp_get_port_by_wwpn() loops over
adapter->port_list so the iteration variable port is always non-NULL.
Struct device is embedded in struct zfcp_port so &port->dev is always
non-NULL. This is the argument to get_device(). However, if we get an
fc_rport in terminate_rport_io() for which we cannot find a match within
zfcp_get_port_by_wwpn(), the latter can return NULL. v2.6.30 commit
70932935b61e ("[SCSI] zfcp: Fix oops when port disappears") introduced an
early return without adding a trace record for this case. Even if we don't
need recovery in this case, for debugging we should still see that our
callback was invoked originally by scsi_transport_fc.Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : sctrpin SCSI terminate rport I/O, no zfcp port
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x WWPN
D_ID : 0x N_Port-ID
Adapter status : 0x...
Port status : 0xffffffff unknown (-1)
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xc0 ZFCP_ERP_ACTION_NONESigned-off-by: Steffen Maier
Fixes: 70932935b61e ("[SCSI] zfcp: Fix oops when port disappears")
Cc: #2.6.38+
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 512857a795cbbda5980efa4cdb3c0b6602330408 upstream.
If a SCSI device is deleted during scsi_eh host reset, we cannot get a
reference to the SCSI device anymore since scsi_device_get returns !=0 by
design. Assuming the recovery of adapter and port(s) was successful,
zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the
half-gone SCSI device. Unfortunately, it causes the following confusing
trace record which states that zfcp will do a LUN recovery as "ERP need" is
ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want".Old example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x
WWPN : 0x
D_ID : 0x
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000 ZFCP_STATUS_COMMON_RUNNING
but not ZFCP_STATUS_COMMON_UNBLOCKED as it
was closed on close part of adapter reopen
ERP want : 0x01
ERP need : 0x01 misleadingHowever, zfcp_erp_setup_act() returns NULL as it cannot get the reference.
Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery
actually happens.We always do want the recovery trigger trace record even if no erp_action
could be enqueued as in this case. For other cases where we did not enqueue
an erp_action, 'need' has always been zero to indicate this. In order to
indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP
need" as 'not needed' but still keep the information which erp_action type,
that zfcp_erp_required_act() had decided upon, is needed. 0xc_ is chosen to
be visibly different from 0x0_ in "ERP want".New example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x
WWPN : 0x
D_ID : 0x
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000
ERP want : 0x01
ERP need : 0xc1 would need LUN ERP, but no action set up
^Before v2.6.38 commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.") we could detect this case because the
"erp_action" field in the trace was NULL. The rework removed erp_action as
argument and field from the trace.This patch here is for tracing. A fix to allow LUN recovery in the case at
hand is a topic for a separate patch.See also commit fdbd1c5e27da ("[SCSI] zfcp: Allow running unit/LUN shutdown
without acquiring reference") for a similar case and background info.Signed-off-by: Steffen Maier
Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Cc: #2.6.38+
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit 81979ae63e872ef650a7197f6ce6590059d37172 upstream.
We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to
zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of
our eh callback and an actual send/recv of an abort / TMF request. In order
to see the temporal sequence including any abort / TMF send retries, add a
trace before the above two blocking functions. This supports problem
determination with scsi_eh and parallel zfcp ERP.No need to explicitly trace the beginning of our eh callback, since we
typically can send an abort / TMF and see its HBA response (in the worst
case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to
an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we
now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes
almost the beginning of the callback.No need to explicitly trace the wakeup after the above two blocking
functions because the next retry loop causes another trace in any case and
that is sufficient.Example trace records formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : abrt_wt abort, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x
SCSI LUN : 0x
SCSI LUN high : 0x
SCSI result : 0x
SCSI retries : 0x
SCSI allowed : 0x
SCSI scribble : 0x
SCSI opcode :
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : lr_wait LUN reset, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x
SCSI LUN : 0x
SCSI LUN high : 0x
SCSI result : 0x... unrelated
SCSI retries : 0x.. unrelated
SCSI allowed : 0x.. unrelated
SCSI scribble : 0x... unrelated
SCSI opcode : ... unrelated
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)Signed-off-by: Steffen Maier
Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Fixes: af4de36d911a ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Cc: #2.6.38+
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman -
commit df30781699f53e4fd4c494c6f7dd16e3d5c21d30 upstream.
For problem determination we need to see whether and why we were successful
or not. This allows deduction of scsi_eh escalation.Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : schrh_r SCSI host reset handler result
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0xffffffff none (invalid)
SCSI LUN : 0xffffffff none (invalid)
SCSI LUN high : 0xffffffff none (invalid)
SCSI result : 0x00002002 field re-used for midlayer value: SUCCESS
or in other cases: 0x2009 == FAST_IO_FAIL
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0xff none (invalid)
FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid)
00000000 00000000v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from
fc_block_scsi_eh to scsi eh") introduced the first return with something
other than the previously hardcoded single SUCCESS return path.Signed-off-by: Steffen Maier
Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Cc: #2.6.38+
Reviewed-by: Jens Remus
Reviewed-by: Benjamin Block
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
21 Jun, 2018
1 commit
-
[ Upstream commit 760dd0eeaec1689430243ead14e5a429613d8c52 ]
The module exit function of the smsgiucv module uses the incorrect CP
command to disable SMSG messages. The correct command is "SET SMSG OFF".
Use it.Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
30 May, 2018
5 commits
-
[ Upstream commit 9851bc77e62499957567e7c39a5beba7d6de6296 ]
vfio-ccw only supports command mode for channel programs, not transport
mode. User space is supposed to already take care of that and pass us
command-mode ORBs only, but better make sure and return an error to
the caller instead of trying to process tcws as ccws.Reviewed-by: Dong Jia Shi
Acked-by: Halil Pasic
Signed-off-by: Cornelia Huck
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 410d5e13e7638bc146321671e223d56495fbf3c7 ]
When we terminate driver I/O (because we need to stop using a certain
channel path) we also need to ensure that a timer (which may have been
set up using ccw_device_start_timeout) is cleared.Signed-off-by: Sebastian Ott
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 770b55c995d171f026a9efb85e71e3b1ea47b93d ]
When a timeout occurs for users of ccw_device_start_timeout
we will stop the IO and call the drivers int handler with
the irb pointer set to ERR_PTR(-ETIMEDOUT). Sometimes
however we'd set the irb pointer to ERR_PTR(-EIO) which is
not intended. Just set the correct value in all codepaths.Reported-by: Julian Wiedmann
Signed-off-by: Sebastian Ott
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f97a6b6c47d2f329a24f92cc0ca3c6df5727ba73 ]
There are cases a device driver can't start IO because the device is
currently in use by cio. In this case the device driver is notified
when the device is usable again.Using ccw_device_start_timeout we would set the timeout (and change
an existing timeout) before we test for internal usage. Worst case
this could lead to an unexpected timer deletion.Fix this by setting the timeout after we test for internal usage.
Signed-off-by: Sebastian Ott
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 9487cfd3430d07366801886bdf185799a2b6f066 ]
Internal DASD device driver I/O such as query host access count or
path verification is started using the _sleep_on() function.
To mark a request as started or ended the callback_data is set to either
DASD_SLEEPON_START_TAG or DASD_SLEEPON_END_TAG.In cases where the request has to be stopped unconditionally the status is
set to DASD_SLEEPON_END_TAG as well which leads to immediate clearing of
the request.
But the request might still be on a device request queue for normal
operation which might lead to a panic because of a BUG() statement in
__dasd_device_process_final_queue() or a list corruption of the device
request queue.Fix by removing the setting of DASD_SLEEPON_END_TAG in the
dasd_cancel_req() and dasd_generic_requeue_all_requests() functions and
ensure that the request is not deleted in the requeue function.
Trigger the device tasklet in the requeue function and let the normal
processing cleanup the request.Signed-off-by: Stefan Haberland
Reviewed-by: Jan Hoeppner
Signed-off-by: Martin Schwidefsky
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
25 May, 2018
1 commit
-
commit fa89adba1941e4f3b213399b81732a5c12fd9131 upstream.
zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
processed asynchronously and concurrently.Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock
and then iterates with list_for_each() over the adapter's ERP ready list
without holding the ERP lock. This opens a race window in which the
current list entry can be moved to another list, causing list_for_each()
to iterate forever on the wrong list, as the erp_ready_head is never
encountered as terminal condition.Meanwhile the ERP action can be processed in the ERP thread by
zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
lock and then calls zfcp_erp_action_to_running() to move the ERP action
from the ready to the running list. zfcp_erp_action_to_running() can
move the ERP action using list_move() just during the aforementioned
race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
held by the infinitely looping kworker, it effectively spins forever.Example Sequence Diagram:
Process ERP Thread rport_work
------------------- ------------------- -------------------
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
zfcp_scsi_schedule_rports_block()
lock ERP zfcp_scsi_rport_work()
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
list_add_tail() on ready !(rport_task==RPORT_ADD)
wake_up() ERP thread zfcp_scsi_rport_block()
zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig()
unlock ERP lock DBF REC
zfcp_erp_wait() lock ERP
| zfcp_erp_action_to_running()
| list_for_each() ready
| list_move() current entry
| ready to running
| zfcp_dbf_rec_run() endless loop over running
| zfcp_dbf_rec_run_lvl()
| lock DBF REC spins foreverAny adapter recovery can trigger this, such as setting the device offline
or reboot.V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport
during rport gone") introduced additional tracing of (un)blocking of
rports. It missed that the adapter->erp_lock must be held when calling
zfcp_dbf_rec_trig().This fix uses the approach formerly introduced by commit aa0fec62391c
("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.").Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
acquires and releases the adapter->erp_lock for read.Reported-by: Sebastian Ott
Signed-off-by: Jens Remus
Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone")
Cc: # 2.6.32+
Reviewed-by: Benjamin Block
Signed-off-by: Steffen Maier
Signed-off-by: Martin K. Petersen
Signed-off-by: Greg Kroah-Hartman
23 May, 2018
3 commits
-
commit 2e68adcd2fb21b7188ba449f0fab3bee2910e500 upstream.
Calling qdio_release_memory() on error is just plain wrong. It frees
the main qdio_irq struct, when following code still uses it.Also, no other error path in qdio_establish() does this. So trust
callers to clean up via qdio_free() if some step of the QDIO
initialization fails.Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.")
Cc: #v2.6.27+
Signed-off-by: Julian Wiedmann
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman -
commit e521813468f786271a87e78e8644243bead48fad upstream.
Ever since CQ/QAOB support was added, calling qdio_free() straight after
qdio_alloc() results in qdio_release_memory() accessing uninitialized
memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a
kmem_cache_free() on the random AOB addresses.For older kernels that don't have 6e30c549f6ca, the same applies if
qdio_establish() fails in the DEV_STATE_ONLINE check.While initializing q->u.out.use_cq would be enough to fix this
particular bug, the more future-proof change is to just zero-alloc the
whole struct.Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: #v3.2+
Signed-off-by: Julian Wiedmann
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman -
commit d66a7355717ec903d455277a550d930ba13df4a8 upstream.
If the translation of a channel program fails, we may end up attempting
to clean up (free, unpin) stuff that never got translated (and allocated,
pinned) in the first place.By adjusting the lengths of the chains accordingly (so the element that
failed, and all subsequent elements are excluded) cleanup activities
based on false assumptions can be avoided.Let's make sure cp_free works properly after cp_prefetch returns with an
error by setting ch_len of a ccw chain to the number of the translated
CCWs on that chain.Cc: stable@vger.kernel.org #v4.12+
Acked-by: Pierre Morel
Reviewed-by: Dong Jia Shi
Signed-off-by: Halil Pasic
Signed-off-by: Dong Jia Shi
Message-Id:
[CH: fixed typos]
Signed-off-by: Cornelia Huck
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman
02 May, 2018
1 commit
-
commit 3368e547c52b96586f0edf9657ca12b94d8e61a7 upstream.
When we call ssch, an interrupt might already be pending once we
return from the START SUBCHANNEL instruction. Therefore we need to
make sure interrupts are disabled while holding the subchannel lock
until after we're done with our processing.Cc: stable@vger.kernel.org #v4.12+
Reviewed-by: Dong Jia Shi
Acked-by: Halil Pasic
Acked-by: Pierre Morel
Signed-off-by: Cornelia Huck
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman
29 Apr, 2018
3 commits
-
commit 5d27a2bf6e14f5c7d1033ad1e993fcd0eba43e83 upstream.
When a new CKD storage volume is defined at the storage server, Linux
may be relying on outdated information about that volume, which leads to
the following errors:1. Command Reject Errors for minidisk on z/VM:
dasd-eckd.b3193d: 0.0.XXXX: An error occurred in the DASD device driver,
reason=09
dasd(eckd): I/O status report for device 0.0.XXXX:
dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:02 CS:00
RC:0
dasd(eckd): device 0.0.2046: Failing CCW: 00000000XXXXXXXX
dasd(eckd): Sense(hex) 0- 7: 80 00 00 00 00 00 00 00
dasd(eckd): Sense(hex) 8-15: 00 00 00 00 00 00 00 00
dasd(eckd): Sense(hex) 16-23: 00 00 00 00 e1 00 0f 00
dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 00 00 00
dasd(eckd): 24 Byte: 0 MSG 0, no MSGb to SYSOP2. Equipment Check errors on LPAR or for dedicated devices on z/VM:
dasd(eckd): I/O status report for device 0.0.XXXX:
dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:0E CS:40
fcxs:01 schxs:00 RC:0
dasd(eckd): device 0.0.9713: Failing TCW: 00000000XXXXXXXX
dasd(eckd): Sense(hex) 0- 7: 10 00 00 00 13 58 4d 0f
dasd(eckd): Sense(hex) 8-15: 67 00 00 00 00 00 00 04
dasd(eckd): Sense(hex) 16-23: e5 18 05 33 97 01 0f 0f
dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 04 58 0d
dasd(eckd): 24 Byte: 0 MSG f, no MSGb to SYSOPFix this problem by using the up-to-date information provided during
online processing via the device specific SNEQ to detect the case of
outdated LCU data. If there is a difference, perform a re-read of that
data.Cc: stable@vger.kernel.org
Reviewed-by: Jan Hoeppner
Signed-off-by: Stefan Haberland
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman -
commit af2e460ade0b0180d0f3812ca4f4f59cc9597f3e upstream.
Channel path descriptors have been seen as something stable (as
long as the chpid is configured). Recent tests have shown that the
descriptor can also be altered when the link state of a channel path
changes. Thus it is necessary to update the descriptor during
handling of resource accessibility events.Cc:
Signed-off-by: Sebastian Ott
Reviewed-by: Peter Oberparleiter
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit f19fbd5ed642dc31c809596412dab1ed56f2f156 ]
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
callbasr %r14,%r1
is replaced with a PC relative call to a new thunk
brasl %r14,__s390x_indirect_jump_r1
The thunk contains the EXRL/EX instruction to the indirect branch
__s390x_indirect_jump_r1:
exrl 0,0f
j .
0: br %r1The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman
19 Apr, 2018
2 commits
-
commit 0cf1e05157b9e5530dcc3ca9fec9bf617fc93375 upstream.
On an Output queue, both EMPTY and PENDING buffer states imply that the
buffer is ready for completion-processing by the upper-layer drivers.So for a non-QEBSM Output queue, get_buf_states() merges mixed
batches of PENDING and EMPTY buffers into one large batch of EMPTY
buffers. The upper-layer driver (ie. qeth) later distuingishes PENDING
from EMPTY by inspecting the slsb_state for
QDIO_OUTBUF_STATE_FLAG_PENDING.But the merge logic in get_buf_states() contains a bug that causes us to
erronously also merge ERROR buffers into such a batch of EMPTY buffers
(ERROR is 0xaf, EMPTY is 0xa1; so ERROR & EMPTY == EMPTY).
Effectively, most outbound ERROR buffers are currently discarded
silently and processed as if they had succeeded.Note that this affects _all_ non-QEBSM device types, not just IQD with CQ.
Fix it by explicitly spelling out the exact conditions for merging.
For extracting the "get initial state" part out of the loop, this relies
on the fact that get_buf_states() is never called with a count of 0. The
QEBSM path already strictly requires this, and the two callers with
variable 'count' make sure of it.Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: #v3.2+
Signed-off-by: Julian Wiedmann
Reviewed-by: Ursula Braun
Reviewed-by: Benjamin Block
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman -
commit dae55b6fef58530c13df074bcc182c096609339e upstream.
Immediate retry of EQBS after CCQ 96 means that we potentially misreport
the state of buffers inspected during the first EQBS call.This occurs when
1. the first EQBS finds all inspected buffers still in the initial state
set by the driver (ie INPUT EMPTY or OUTPUT PRIMED),
2. the EQBS terminates early with CCQ 96, and
3. by the time that the second EQBS comes around, the state of those
previously inspected buffers has changed.If the state reported by the second EQBS is 'driver-owned', all we know
is that the previous buffers are driver-owned now as well. But we can't
tell if they all have the same state. So for instance
- the second EQBS reports OUTPUT EMPTY, but any number of the previous
buffers could be OUTPUT ERROR by now,
- the second EQBS reports OUTPUT ERROR, but any number of the previous
buffers could be OUTPUT EMPTY by now.Effectively, this can result in both over- and underreporting of errors.
If the state reported by the second EQBS is 'HW-owned', that doesn't
guarantee that the previous buffers have not been switched to
driver-owned in the mean time. So for instance
- the second EQBS reports INPUT EMPTY, but any number of the previous
buffers could be INPUT PRIMED (or INPUT ERROR) by now.This would result in failure to process pending work on the queue. If
it's the final check before yielding initiative, this can cause
a (temporary) queue stall due to IRQ avoidance.Fixes: 25f269f17316 ("[S390] qdio: EQBS retry after CCQ 96")
Cc: #v3.2+
Signed-off-by: Julian Wiedmann
Reviewed-by: Benjamin Block
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman
01 Apr, 2018
4 commits
-
[ Upstream commit a6c3d93963e4b333c764fde69802c3ea9eaa9d5c ]
When the IRQ handler determines that one of the cmd IO channels has
failed and schedules recovery, block any further cmd requests from
being submitted. The request would inevitably stall, and prevent the
recovery from making progress until the request times out.This sort of error was observed after Live Guest Relocation, where
the pending IO on the READ channel intentionally gets terminated to
kick-start recovery. Simultaneously the guest executed SIOCETHTOOL,
triggering qeth to issue a QUERY CARD INFO command. The command
then stalled in the inoperabel WRITE channel.Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 17bf8c9b3d499d5168537c98b61eb7a1fcbca6c2 ]
For calling ccw_device_start(), issue_next_read() needs to hold the
device's ccwlock.
This is satisfied for the IRQ handler path (where qeth_irq() gets called
under the ccwlock), but we need explicit locking for the initial call by
the MPC initialization.Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1063e432bb45be209427ed3f1ca3908e4aa3c7d7 ]
qeth_wait_for_threads() is potentially called by multiple users, make
sure to notify all of them after qeth_clear_thread_running_bit()
adjusted the thread_running_mask. With no timeout, callers would
otherwise stall.Signed-off-by: Julian Wiedmann
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 6be687395b3124f002a653c1a50b3260222b3cd7 ]
On removal, a qeth card's netdevice is currently not properly freed
because the call chain looks as follows:qeth_core_remove_device(card)
lx_remove_device(card)
unregister_netdev(card->dev)
card->dev = NULL !!!
qeth_core_free_card(card)
if (card->dev) !!!
free_netdev(card->dev)Fix it by free'ing the netdev straight after unregistering. This also
fixes the sysfs-driven layer switch case (qeth_dev_layer2_store()),
where the need to free the current netdevice was not considered at all.Note that free_netdev() takes care of the netif_napi_del() for us too.
Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann
Reviewed-by: Ursula Braun
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman