Eric Lee / smarc-fsl-linux-kernel

20 Jan, 2021

6 commits

76600f633 nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT ... Browse Code »

commit ada831772188192243f9ea437c46e37e97a5975d upstream.

We shouldn't call smp_processor_id() in a preemptible
context, but this is advisory at best, so instead
call __smp_processor_id().

Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Or Gerlitz
Reported-by: Yi Zhang
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Sagi Grimberg
2021-01-20 01:27:30 +0800
b1e9f635a nvme-tcp: fix possible data corruption with bio merges ... Browse Code »

commit ca1ff67d0fb14f39cf0cc5102b1fbcc3b14f6fb9 upstream.

When a bio merges, we can get a request that spans multiple
bios, and the overall request payload size is the sum of
all bios. When we calculate how much we need to send
from the existing bio (and bvec), we did not take into
account the iov_iter byte count cap.

Since multipage bvecs support, bvecs can split in the middle
which means that when we account for the last bvec send we
should also take the iov_iter byte count cap as it might be
lower than the last bvec size.

Reported-by: Hao Wang
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Tested-by: Hao Wang
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Sagi Grimberg
2021-01-20 01:27:30 +0800
c09af1ee7 nvme: don't intialize hwmon for discovery controllers ... Browse Code »

commit 5ab25a32cd90ce561ac28b9302766e565d61304c upstream.

Discovery controllers usually don't support smart log page command.
So when we connect to the discovery controller we see this warning:
nvme nvme0: Failed to read smart log (error 24577)
nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.123.1:8009
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

Introduce a new helper to understand if the controller is a discovery
controller and use this helper to skip nvme_init_hwmon (also use it in
other places that we check if the controller is a discovery controller).

Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Sagi Grimberg
2021-01-20 01:27:30 +0800
f1cd8c409 nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context ... Browse Code »

[ Upstream commit 19fce0470f05031e6af36e49ce222d0f0050d432 ]

Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
used to be called from a timeout or work context. Now it is being called
in an io completion context, which can be an interrupt handler.
Unfortunately, the abort outstanding ios routine attempts to stop nvme
queues and nested routines that may try to sleep, which is in conflict
with the interrupt handler.

Correct replacing the direct call with a work element scheduling, and the
abort outstanding ios routine will be called in the work element.

Fixes: 95ced8a2c72d ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
Signed-off-by: James Smart
Reported-by: Daniel Wagner
Tested-by: Daniel Wagner
Signed-off-by: Christoph Hellwig
Signed-off-by: Sasha Levin

James Smart
2021-01-20 01:27:28 +0800
74310d40e nvme: avoid possible double fetch in handling CQE ... Browse Code »

[ Upstream commit 62df80165d7f197c9c0652e7416164f294a96661 ]

While handling the completion queue, keep a local copy of the command id
from the DMA-accessible completion entry. This silences a time-of-check
to time-of-use (TOCTOU) warning from KF/x[1], with respect to a
Thunderclap[2] vulnerability analysis. The double-read impact appears
benign.

There may be a theoretical window for @command_id to be used as an
adversary-controlled array-index-value for mounting a speculative
execution attack, but that mitigation is saved for a potential follow-on.
A man-in-the-middle attack on the data payload is out of scope for this
analysis and is hopefully mitigated by filesystem integrity mechanisms.

[1] https://github.com/intel/kernel-fuzzer-for-xen-project
[2] http://thunderclap.io/thunderclap-paper-ndss2019.pdf
Signed-off-by: Lalithambika Krishna Kumar
Signed-off-by: Christoph Hellwig
Signed-off-by: Sasha Levin

Lalithambika Krishnakumar
2021-01-20 01:27:26 +0800
afc0002f6 nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN ... Browse Code »

[ Upstream commit 7ee5c78ca3895d44e918c38332921983ed678be0 ]

A system with more than one of these SSDs will only have one usable.
Hence the kernel fails to detect nvme devices due to duplicate cntlids.

[ 6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
[ 6.274566] nvme nvme1: Removing after probe failure status: -22

Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.

Signed-off-by: Gopal Tiwari
Signed-off-by: Christoph Hellwig
Signed-off-by: Sasha Levin

Gopal Tiwari
2021-01-20 01:27:26 +0800

17 Jan, 2021

1 commit

b23accd11 nvme-tcp: Fix possible race of io_work and direct send ... Browse Code »

commit 5c11f7d9f843bdd24cd29b95401938bc3f168070 upstream.

We may send a request (with or without its data) from two paths:

1. From our I/O context nvme_tcp_io_work which is triggered from:
- queue_rq
- r2t reception
- socket data_ready and write_space callbacks
2. Directly from queue_rq if the send_list is empty (because we want to
save the context switch associated with scheduling our io_work).

However, given that now we have the send_mutex, we may run into a race
condition where none of these contexts will send the pending payload to
the controller. Both io_work send path and queue_rq send path
opportunistically attempt to acquire the send_mutex however queue_rq only
attempts to send a single request, and if io_work context fails to
acquire the send_mutex it will complete without rescheduling itself.

The race can trigger with the following sequence:

1. queue_rq sends request (no incapsule data) and blocks
2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
to the send_list and schedules io_work
3. io_work triggers and cannot acquire the send_mutex - because of (1),
ends without self rescheduling
4. queue_rq completes the send, and completes

==> no context will send the h2cdata - timeout.

Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.

Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja
Reported-by: Samuel Jones
Signed-off-by: Sagi Grimberg
Tested-by: Potnuri Bharat Teja
Signed-off-by: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Sagi Grimberg
2021-01-17 21:17:03 +0800

14 Nov, 2020

3 commits

8168d23fb nvme: fix memory leak freeing command effects ... Browse Code »

xa_destroy() frees only internal data. The caller is responsible for
freeing the exteranl objects referenced by an xarray.

Fixes: 1cf7a12e09aa4 ("nvme: use an xarray to lookup the Commands Supported and Effects log")
Signed-off-by: Keith Busch
Signed-off-by: Christoph Hellwig

Keith Busch
2020-11-14 16:57:55 +0800
f6224b868 nvme: directly cache command effects log ... Browse Code »

Remove the struct used for tracking known command effects logs in a
list. This is now saved in an xarray that doesn't use these elements.
Instead, store the log directly instead of the wrapper struct.

Signed-off-by: Keith Busch
Signed-off-by: Christoph Hellwig

Keith Busch
2020-11-14 16:57:55 +0800
0f0d2c876 nvme: free sq/cq dbbuf pointers when dbbuf set fails ... Browse Code »

If Doorbell Buffer Config command fails even 'dev->dbbuf_dbs != NULL'
which means OACS indicates that NVME_CTRL_OACS_DBBUF_SUPP is set,
nvme_dbbuf_update_and_check_event() will check event even it's not been
successfully set.

This patch fixes mismatch among dbbuf for sq/cqs in case that dbbuf
command fails.

Signed-off-by: Minwoo Im
Signed-off-by: Christoph Hellwig

Minwoo Im
2020-11-14 16:57:55 +0800

10 Nov, 2020

1 commit

65c5a055b nvme: fix incorrect behavior when BLKROSET is called by the user ... Browse Code »

The offending commit breaks BLKROSET ioctl because a device
revalidation will blindly override BLKROSET setting. Hence,
we remove the disk rw setting in case NVME_NS_ATTR_RO is cleared
from by the controller.

Fixes: 1293477f4f32 ("nvme: set gendisk read only based on nsattr")
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Sagi Grimberg
2020-11-10 00:39:15 +0800

05 Nov, 2020

1 commit

7ae7a8de0 Merge tag 'nvme-5.10-2020-11-05' of git://git.infradead.org/nvme into block-5.10 ... Browse Code »

Pull NVMe fixes from Christoph:

"nvme fixes for 5.10:

- revert a nvme_queue size optimization (Keith Bush)
- fabrics timeout races fixes (Chao Leng and Sagi Grimberg)"

* tag 'nvme-5.10-2020-11-05' of git://git.infradead.org/nvme:
nvme-tcp: avoid repeated request completion
nvme-rdma: avoid repeated request completion
nvme-tcp: avoid race between time out and tear down
nvme-rdma: avoid race between time out and tear down
nvme: introduce nvme_sync_io_queues
Revert "nvme-pci: remove last_sq_tail"

Jens Axboe
2020-11-05 22:10:50 +0800

03 Nov, 2020

6 commits

0a8a2c85b nvme-tcp: avoid repeated request completion ... Browse Code »

The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.

Signed-off-by: Sagi Grimberg
Signed-off-by: Chao Leng
Signed-off-by: Christoph Hellwig

Sagi Grimberg
2020-11-03 17:26:02 +0800
fdf58e02a nvme-rdma: avoid repeated request completion ... Browse Code »

The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.

Signed-off-by: Sagi Grimberg
Signed-off-by: Chao Leng
Signed-off-by: Christoph Hellwig

Sagi Grimberg
2020-11-03 17:26:02 +0800
d6f66210f nvme-tcp: avoid race between time out and tear down ... Browse Code »

Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.

Signed-off-by: Chao Leng
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Chao Leng
2020-11-03 17:26:02 +0800
3017013dc nvme-rdma: avoid race between time out and tear down ... Browse Code »

Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.

Signed-off-by: Chao Leng
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Chao Leng
2020-11-03 17:26:01 +0800
04800fbff nvme: introduce nvme_sync_io_queues ... Browse Code »

Introduce sync io queues for some scenarios which just only need sync
io queues not sync all queues.

Signed-off-by: Chao Leng
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Chao Leng
2020-11-03 17:25:55 +0800
38210800b Revert "nvme-pci: remove last_sq_tail" ... Browse Code »

Multiple CPUs may be mapped to the same hctx, allowing mulitple
submission contexts to attempt commit_rqs(). We need to verify we're
not writing the same doorbell value multiple times since that's a spec
violation.

Revert commit 54b2fcee1db041a83b52b51752dade6090cf952f.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1878596
Reported-by: "B.L. Jones"
Signed-off-by: Keith Busch

Keith Busch
2020-11-03 02:03:14 +0800

31 Oct, 2020

1 commit

5fc6b075e Merge tag 'block-5.10-2020-10-30' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- null_blk zone fixes (Damien, Kanchan)

- NVMe pull request from Christoph:
- improve zone revalidation (Keith Busch)
- gracefully handle zero length messages in nvme-rdma (zhenwei pi)
- nvme-fc error handling fixes (James Smart)
- nvmet tracing NULL pointer dereference fix (Chaitanya Kulkarni)"

- xsysace platform fixes (Andy)

- scatterlist type cleanup (David)

- blk-cgroup memory fixes (Gabriel)

- nbd block size update fix (Ming)

- Flush completion state fix (Ming)

- bio_add_hw_page() iteration fix (Naohiro)

* tag 'block-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
blk-mq: mark flush request as IDLE in flush_end_io()
lib/scatterlist: use consistent sg_copy_buffer() return type
xsysace: use platform_get_resource() and platform_get_irq_optional()
null_blk: Fix locking in zoned mode
null_blk: Fix zone reset all tracing
nbd: don't update block size after device is started
block: advance iov_iter on bio_add_hw_page failure
null_blk: synchronization fix for zoned device
nvmet: fix a NULL pointer dereference when tracing the flush command
nvme-fc: remove nvme_fc_terminate_io()
nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery
nvme-fc: remove err_work work item
nvme-fc: track error_recovery while connecting
nvme-rdma: handle unexpected nvme completion data length
nvme: ignore zone validate errors on subsequent scans
blk-cgroup: Pre-allocate tree node on blkg_conf_prep
blk-cgroup: Fix memleak on error path

Linus Torvalds
2020-10-31 06:02:49 +0800

28 Oct, 2020

1 commit

071ba4cc5 RDMA: Add rdma_connect_locked() ... Browse Code »

There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().

In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.

Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.

Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang
Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar
Acked-by: Jack Wang
Reviewed-by: Christoph Hellwig
Reviewed-by: Max Gurtovoy
Reviewed-by: Sagi Grimberg
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-10-28 20:14:49 +0800

27 Oct, 2020

6 commits

ac9b820e7 nvme-fc: remove nvme_fc_terminate_io() ... Browse Code »

__nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
Consoldate and move the functionality of terminate_io into reset_work.

In reset_work, rather than calling the create_association directly,
schedule the connect work element to do its thing. After scheduling,
flush the connect work element to continue with semantic of not
returning until connect has been attempted at least once.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:02:29 +0800
95ced8a2c nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery ... Browse Code »

nvme_fc_error_recovery() special cases handling when in CONNECTING state
and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
special cases CONNECTING state and calls the routine to abort outstanding
ios.

Simplify the sequence by putting the call to abort outstanding I/Os
directly in nvme_fc_error_recovery.

Move the location of __nvme_fc_abort_outstanding_ios(), and
nvme_fc_terminate_exchange() which is called by it, to avoid adding
function prototypes for nvme_fc_error_recovery().

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:02:08 +0800
9c2bb2577 nvme-fc: remove err_work work item ... Browse Code »

err_work was created to handle errors (mainly I/O timeouts) while in
CONNECTING state. The flag for err_work_active is also unneeded.

Remove err_work_active and err_work. The actions to abort I/Os are moved
inline to nvme_error_recovery().

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:01:39 +0800
caf1cbe36 nvme-fc: track error_recovery while connecting ... Browse Code »

Whenever there are errors during CONNECTING, the driver recovers by
aborting all outstanding ios and counts on the io completion to fail them
and thus the connection/association they are on. However, the connection
failure depends on a failure state from the core routines. Not all
commands that are issued by the core routine are guaranteed to cause a
failure of the core routine. They may be treated as a failure status and
the status is then ignored.

As such, whenever the transport enters error_recovery while CONNECTING,
it will set a new flag indicating an association failed. The
create_association routine which creates and initializes the controller,
will monitor the state of the flag as well as the core routine error
status and ensure the association fails if there was an error.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-27 17:01:30 +0800
25c1ca6ec nvme-rdma: handle unexpected nvme completion data length ... Browse Code »

Receiving a zero length message leads to the following warnings because
the CQE is processed twice:

refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28

RIP: 0010:refcount_warn_saturate+0xd9/0xe0
Call Trace:

nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma]
__ib_process_cq+0x76/0x150 [ib_core]
...

Sanity check the received data length, to avoids this.

Thanks to Chao Leng & Sagi for suggestions.

Signed-off-by: zhenwei pi
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

zhenwei pi
2020-10-27 17:00:05 +0800
8685699c2 nvme: ignore zone validate errors on subsequent scans ... Browse Code »

Revalidating nvme zoned namespaces requires IO commands, and there are
controller states that prevent IO. For example, a sanitize in progress
is required to fail all IO, but we don't want to remove a namespace
we've previously added just because the controller is in such a state.
Suppress the error in this case.

Reported-by: Michael Nguyen
Signed-off-by: Keith Busch
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Keith Busch
2020-10-27 16:58:42 +0800

23 Oct, 2020

4 commits

f673714a1 nvme-fc: shorten reconnect delay if possible for FC ... Browse Code »

We've had several complaints about a 10s reconnect delay (the default)
when there was an error while there is connectivity to a subsystem.
The max_reconnects and reconnect_delay are set in common code prior to
calling the transport to create the controller.

This change checks if the default reconnect delay is being used, and if
so, it adjusts it to a shorter period (2s) for the nvme-fc transport.
It does so by calculating the controller loss tmo window, changing the
value of the reconnect delay, and then recalculating the maximum number
of reconnect attempts allowed.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:54:45 +0800
88e837ed0 nvme-fc: wait for queues to freeze before calling update_hr_hw_queues ... Browse Code »

On reconnect, the code currently does not freeze the controller before
possibly updating the number hw queues for the controller.

Add the freeze before updating the number of hw queues. Note: the queues
are already started and remain started through the reconnect.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:54:36 +0800
514a6dc9e nvme-fc: fix error loop in create_hw_io_queues ... Browse Code »

The loop that backs out of hw io queue creation continues through index
0, which corresponds to the admin queue as well.

Fix the loop so it only proceeds through indexes 1..n which correspond to
I/O queues.

Signed-off-by: James Smart
Reviewed-by: Himanshu Madhani
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:54:23 +0800
52793d62a nvme-fc: fix io timeout to abort I/O ... Browse Code »

Currently, an I/O timeout unconditionally invokes
nvme_fc_error_recovery() which checks for LIVE or CONNECTING state. If
live, the routine resets the controller which initiates a reconnect -
which is valid. If CONNECTING, err_work is scheduled. Err_work then
calls the terminate_io routine, which also checks for CONNECTING and
noops any further action on outstanding I/O. The result is nothing
happened to the timed out io. As such, if the command was dropped on
the wire, it will never timeout / complete, and the connect process
will hang.

Change the behavior of the io timeout routine to unconditionally abort
the I/O. I/O completion handling will note that an io failed due to an
abort and will terminate the connection / association as needed. If the
abort was unable to happen, continue with a call to
nvme_fc_error_recovery(). To ensure something different happens in
nvme_fc_error_recovery() rework it so at it will abort all I/Os on the
association to force a failure.

As I/O aborts now may occur outside of delete_association, counting for
completion must be wary and only count those aborted during
delete_association when TERMIO is set on the controller.

Signed-off-by: James Smart
Signed-off-by: Christoph Hellwig

James Smart
2020-10-23 18:52:16 +0800

22 Oct, 2020

4 commits

02ca079c9 nvme-pci: disable Write Zeroes on Sandisk Skyhawk ... Browse Code »

Like commit 5611ec2b9814 ("nvme-pci: prevent SK hynix PC400 from using
Write Zeroes command"), Sandisk Skyhawk has the same issue:
[ 6305.633887] blk_update_request: operation not supported error, dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0

So also disable Write Zeroes command on Sandisk Skyhawk.

BugLink: https://bugs.launchpad.net/bugs/1899503
Signed-off-by: Kai-Heng Feng
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Kai-Heng Feng
2020-10-22 21:27:14 +0800
643c476d6 nvme: use queuedata for nvme_req_qid ... Browse Code »

The request's rq_disk isn't set for passthrough IO commands, so tracing
uses qid 0 for these which incorrectly decodes as an admin command. Use
the request_queue's queuedata instead since that value is always set for
the IO queues, and never set for the admin queue.

Signed-off-by: Keith Busch
Signed-off-by: Christoph Hellwig

Keith Busch
2020-10-22 21:27:14 +0800
a87da50f3 nvme-rdma: fix crash due to incorrect cqe ... Browse Code »

A crash happened due to injecting error test.
When a CQE has incorrect command id due do an error injection, the host
may find a request which is already freed. Dereferencing req->mr->rkey
causes a crash in nvme_rdma_process_nvme_rsp because the mr is already
freed.

Add a check for the mr to fix it.

Signed-off-by: Chao Leng
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Chao Leng
2020-10-22 21:27:14 +0800
43efdb8e8 nvme-rdma: fix crash when connect rejected ... Browse Code »

A crash can happened when a connect is rejected. The host establishes
the connection after received ConnectReply, and then continues to send
the fabrics Connect command. If the controller does not receive the
ReadyToUse capsule, host may receive a ConnectReject reply.

Call nvme_rdma_destroy_queue_ib after the host received the
RDMA_CM_EVENT_REJECTED event. Then when the fabrics Connect command
times out, nvme_rdma_timeout calls nvme_rdma_complete_rq to fail the
request. A crash happenes due to use after free in
nvme_rdma_complete_rq.

nvme_rdma_destroy_queue_ib is redundant when handling the
RDMA_CM_EVENT_REJECTED event as nvme_rdma_destroy_queue_ib is already
called in connection failure handler.

Signed-off-by: Chao Leng
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Chao Leng
2020-10-22 21:27:13 +0800

14 Oct, 2020

3 commits

afaf5c6c8 nvme: translate zone resource errors ... Browse Code »

Translate zoned resource errors to the appropriate blk_status_t.

Reviewed-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2020-10-14 05:05:05 +0800
7cd4ecd91 Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.

A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.

This contains:

- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)

- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu

- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)

- null_blk open/active zone limit support (Niklas)

- Set of bcache updates (Coly, Dongsheng, Qinglang)"

* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...

Linus Torvalds
2020-10-14 04:04:41 +0800
3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

09 Oct, 2020

1 commit

583090b1b Merge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes that should go into this release:

- NVMe controller error path reference fix (Chaitanya)

- Fix regression with IBM partitions on non-dasd devices (Christoph)

- Fix a missing clear in the compat CDROM packet structure (Peilin)"

* tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block:
partitions/ibm: fix non-DASD devices
nvme-core: put ctrl ref when module ref get fail
block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()

Linus Torvalds
2020-10-09 09:48:34 +0800

07 Oct, 2020

2 commits

c4485252c nvme-core: remove extra condition for vwc ... Browse Code »

In nvme_set_queue_limits() we initialize vwc to false and later add
a condition to set vwc true. The value of the vwc can be declare
initialized which makes all the blk_queue_XXX() calls uniform.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Keith Busch
Signed-off-by: Christoph Hellwig

Chaitanya Kulkarni
2020-10-07 13:56:20 +0800
af5d6f7ba nvme-core: remove extra variable ... Browse Code »

In nvme_validate_ns() the exra variable ctrl is used only twice.
Using ns->ctrl directly still maintains the redability and original
length of the lines in the code. Get rid of the extra variable ctrl &
use ns->ctrl directly.

Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig

Chaitanya Kulkarni
2020-10-07 13:56:20 +0800