Eric Lee / smarc-fsl-linux-kernel

05 Mar, 2020

4 commits

567b108c9 nvme-pci: Hold cq_poll_lock while completing CQEs ... Browse Code »

commit 9515743bfb39c61aaf3d4f3219a645c8d1fe9a0e upstream.

Completions need to consumed in the same order the controller submitted
them, otherwise future completion entries may overwrite ones we haven't
handled yet. Hold the nvme queue's poll lock while completing new CQEs to
prevent another thread from freeing command tags for reuse out-of-order.

Fixes: dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: Bijan Mottahedeh
Reviewed-by: Sagi Grimberg
Reviewed-by: Jens Axboe
Signed-off-by: Keith Busch
Signed-off-by: Greg Kroah-Hartman

Bijan Mottahedeh
2020-03-05 23:43:46 +0800
36d7477fa nvme/pci: move cqe check after device shutdown ... Browse Code »

[ Upstream commit fa46c6fb5d61b1f17b06d7c6ef75478b576304c7 ]

Many users have reported nvme triggered irq_startup() warnings during
shutdown. The driver uses the nvme queue's irq to synchronize scanning
for completions, and enabling an interrupt affined to only offline CPUs
triggers the alarming warning.

Move the final CQE check to after disabling the device and all
registered interrupts have been torn down so that we do not have any
IRQ to synchronize.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Keith Busch
2020-03-05 23:43:41 +0800
b1fd0e551 nvme: prevent warning triggered by nvme_stop_keep_alive ... Browse Code »

[ Upstream commit 97b2512ad000a409b4073dd1a71e4157d76675cb ]

Delayed keep alive work is queued on system workqueue and may be cancelled
via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq.

Check_flush_dependency detects mismatched attributes between the work-queue
context used to cancel the keep alive work and system-wq. Specifically
system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
to cancel keep alive work have WQ_MEM_RECLAIM flag.

Example warning:

workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]

To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq.

However this creates a secondary concern where work and a request to cancel
that work may be in the same work queue - namely err_work in the rdma and
tcp transports, which will want to flush/cancel the keep alive work which
will now be on nvme_wq.

After reviewing the transports, it looks like err_work can be moved to
nvme_reset_wq. In fact that aligns them better with transition into
RESETTING and performing related reset work in nvme_reset_wq.

Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq.

Signed-off-by: Nigel Kirkland
Signed-off-by: James Smart
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Nigel Kirkland
2020-03-05 23:43:41 +0800
d4d26a506 nvme/tcp: fix bug on double requeue when send fails ... Browse Code »

[ Upstream commit 2d570a7c0251c594489a2c16b82b14ae30345c03 ]

When nvme_tcp_io_work() fails to send to socket due to
connection close/reset, error_recovery work is triggered
from nvme_tcp_state_change() socket callback.
This cancels all the active requests in the tagset,
which requeues them.

The failed request, however, was ended and thus requeued
individually as well unless send returned -EPIPE.
Another return code to be treated the same way is -ECONNRESET.

Double requeue caused BUG_ON(blk_queued_rq(rq))
in blk_mq_requeue_request() from either the individual requeue
of the failed request or the bulk requeue from
blk_mq_tagset_busy_iter(, nvme_cancel_request, );

Signed-off-by: Anton Eidelman
Reviewed-by: Sagi Grimberg
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Anton Eidelman
2020-03-05 23:43:41 +0800

29 Feb, 2020

1 commit

6e304262e nvme-multipath: Fix memory leak with ana_log_buf ... Browse Code »

commit 3b7830904e17202524bad1974505a9bfc718d31f upstream.

kmemleak reports a memory leak with the ana_log_buf allocated by
nvme_mpath_init():

unreferenced object 0xffff888120e94000 (size 8208):
comm "nvme", pid 6884, jiffies 4295020435 (age 78786.312s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmalloc_order+0x97/0xc0
[] kmalloc_order_trace+0x24/0x100
[] __kmalloc+0x24c/0x2d0
[] nvme_mpath_init+0x23c/0x2b0
[] nvme_init_identify+0x75f/0x1600
[] nvme_loop_configure_admin_queue+0x26d/0x280
[] nvme_loop_create_ctrl+0x2a7/0x710
[] nvmf_dev_write+0xc66/0x10b9
[] __vfs_write+0x50/0xa0
[] vfs_write+0xf3/0x280
[] ksys_write+0xc6/0x160
[] __x64_sys_write+0x43/0x50
[] do_syscall_64+0x77/0x2f0
[] entry_SYSCALL_64_after_hwframe+0x49/0xbe

nvme_mpath_init() is called by nvme_init_identify() which is called in
multiple places (nvme_reset_work(), nvme_passthru_end(), etc). This
means nvme_mpath_init() may be called multiple times before
nvme_mpath_uninit() (which is only called on nvme_free_ctrl()).

When nvme_mpath_init() is called multiple times, it overwrites the
ana_log_buf pointer with a new allocation, thus leaking the previous
allocation.

To fix this, free ana_log_buf before allocating a new one.

Fixes: 0d0b660f214dc490 ("nvme: add ANA support")
Cc:
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Logan Gunthorpe
Signed-off-by: Keith Busch
Signed-off-by: Greg Kroah-Hartman

Logan Gunthorpe
2020-02-29 00:22:20 +0800

24 Feb, 2020

2 commits

044838772 nvme-pci: remove nvmeq->tags ... Browse Code »

[ Upstream commit cfa27356f835dc7755192e7b941d4f4851acbcc7 ]

There is no real need to have a pointer to the tagset in
struct nvme_queue, as we only need it in a single place, and that place
can derive the used tagset from the device and qid trivially. This
fixes a problem with stale pointer exposure when tagsets are reset,
and also shrinks the nvme_queue structure. It also matches what most
other transports have done since day 1.

Reported-by: Edmund Nadolski
Signed-off-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

Christoph Hellwig
2020-02-24 15:37:01 +0800
1d0fbf3e2 nvmet: Pass lockdep expression to RCU lists ... Browse Code »

[ Upstream commit 4ac76436a6d07dec1c3c766f234aa787a16e8f65 ]

ctrl->subsys->namespaces and subsys->namespaces are traversed with
list_for_each_entry_rcu outside an RCU read-side critical section but
under the protection of ctrl->subsys->lock and subsys->lock respectively.

Hence, add the corresponding lockdep expression to the list traversal
primitive to silence false-positive lockdep warnings, and harden RCU
lists.

Reported-by: kbuild test robot
Reviewed-by: Joel Fernandes (Google)
Signed-off-by: Amol Grover
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

Amol Grover
2020-02-24 15:37:01 +0800

20 Feb, 2020

1 commit

5e9f573dc nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info ... Browse Code »

commit f25372ffc3f6c2684b57fb718219137e6ee2b64c upstream.

nvme fw-activate operation will get bellow warning log,
fix it by update the parameter order

[ 113.231513] nvme nvme0: Get FW SLOT INFO log error

Fixes: 0e98719b0e4b ("nvme: simplify the API for getting log pages")
Reported-by: Sujith Pandel
Reviewed-by: David Milburn
Signed-off-by: Yi Zhang
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Yi Zhang
2020-02-20 02:53:05 +0800

11 Feb, 2020

2 commits

21780d1fd nvmet: Fix controller use after free ... Browse Code »

commit 1a3f540d63152b8db0a12de508bfa03776217d83 upstream.

After nvmet_install_queue() sets sq->ctrl calling to nvmet_sq_destroy()
reduces the controller refcount. In case nvmet_install_queue() fails,
calling to nvmet_ctrl_put() is done twice (at nvmet_sq_destroy and
nvmet_execute_io_connect/nvmet_execute_admin_connect) instead of once for
the queue which leads to use after free of the controller. Fix this by set
NULL at sq->ctrl in case of a failure at nvmet_install_queue().

The bug leads to the following Call Trace:

[65857.994862] refcount_t: underflow; use-after-free.
[65858.108304] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[65858.115557] RIP: 0010:refcount_warn_saturate+0xe5/0xf0
[65858.208141] Call Trace:
[65858.211203] nvmet_sq_destroy+0xe1/0xf0 [nvmet]
[65858.216383] nvmet_rdma_release_queue_work+0x37/0xf0 [nvmet_rdma]
[65858.223117] process_one_work+0x167/0x370
[65858.227776] worker_thread+0x49/0x3e0
[65858.232089] kthread+0xf5/0x130
[65858.235895] ? max_active_store+0x80/0x80
[65858.240504] ? kthread_bind+0x10/0x10
[65858.244832] ret_from_fork+0x1f/0x30
[65858.249074] ---[ end trace f82d59250b54beb7 ]---

Fixes: bb1cc74790eb ("nvmet: implement valid sqhd values in completions")
Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout")
Signed-off-by: Israel Rukshin
Reviewed-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Greg Kroah-Hartman

Israel Rukshin
2020-02-11 20:35:08 +0800
6243cb9e3 nvmet: Fix error print message at nvmet_install_queue function ... Browse Code »

commit 0b87a2b795d66be7b54779848ef0f3901c5e46fc upstream.

Place the arguments in the correct order.

Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout")
Signed-off-by: Israel Rukshin
Reviewed-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Greg Kroah-Hartman

Israel Rukshin
2020-02-11 20:35:08 +0800

09 Jan, 2020

4 commits

7a6cec43b nvme/pci: Fix read queue count ... Browse Code »

[ Upstream commit 7e4c6b9a5d22485acf009b3c3510a370f096dd54 ]

If nvme.write_queues equals the number of CPUs, the driver had decreased
the number of interrupts available such that there could only be one read
queue even if the controller could support more. Remove the interrupt
count reduction in this case. The driver wouldn't request more IRQs than
it wants queues anyway.

Reviewed-by: Jens Axboe
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

Keith Busch
2020-01-09 17:19:43 +0800
29cfb7940 nvme/pci: Fix write and poll queue types ... Browse Code »

[ Upstream commit 3f68baf706ec68c4120867c25bc439c845fe3e17 ]

The number of poll or write queues should never be negative. Use unsigned
types so that it's not possible to break have the driver not allocate
any queues.

Reviewed-by: Jens Axboe
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

Keith Busch
2020-01-09 17:19:43 +0800
afde69ecd nvme-fc: fix double-free scenarios on hw queues ... Browse Code »

[ Upstream commit c869e494ef8b5846d9ba91f1e922c23cd444f0c1 ]

If an error occurs on one of the ios used for creating an
association, the creating routine has error paths that are
invoked by the command failure and the error paths will free
up the controller resources created to that point.

But... the io was ultimately determined by an asynchronous
completion routine that detected the error and which
unconditionally invokes the error_recovery path which calls
delete_association. Delete association deletes all outstanding
io then tears down the controller resources. So the
create_association thread can be running in parallel with
the error_recovery thread. What was seen was the LLDD received
a call to delete a queue, causing the LLDD to do a free of a
resource, then the transport called the delete queue again
causing the driver to repeat the free call. The second free
routine corrupted the allocator. The transport shouldn't be
making the duplicate call, and the delete queue is just one
of the resources being freed.

To fix, it is realized that the create_association path is
completely serialized with one command at a time. So the
failed io completion will always be seen by the create_association
path and as of the failure, there are no ios to terminate and there
is no reason to be manipulating queue freeze states, etc.
The serialized condition stays true until the controller is
transitioned to the LIVE state. Thus the fix is to change the
error recovery path to check the controller state and only
invoke the teardown path if not already in the CONNECTING state.

Reviewed-by: Himanshu Madhani
Reviewed-by: Ewan D. Milne
Signed-off-by: James Smart
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

James Smart
2020-01-09 17:19:41 +0800
6b49a5a9e nvme_fc: add module to ops template to allow module references ... Browse Code »

[ Upstream commit 863fbae929c7a5b64e96b8a3ffb34a29eefb9f8f ]

In nvme-fc: it's possible to have connected active controllers
and as no references are taken on the LLDD, the LLDD can be
unloaded. The controller would enter a reconnect state and as
long as the LLDD resumed within the reconnect timeout, the
controller would resume. But if a namespace on the controller
is the root device, allowing the driver to unload can be problematic.
To reload the driver, it may require new io to the boot device,
and as it's no longer connected we get into a catch-22 that
eventually fails, and the system locks up.

Fix this issue by taking a module reference for every connected
controller (which is what the core layer did to the transport
module). Reference is cleared when the controller is removed.

Acked-by: Himanshu Madhani
Reviewed-by: Christoph Hellwig
Signed-off-by: James Smart
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

James Smart
2020-01-09 17:19:41 +0800

31 Dec, 2019

2 commits

8b0acb768 nvme: Discard workaround for non-conformant devices ... Browse Code »

[ Upstream commit 530436c45ef2e446c12538a400e465929a0b3ade ]

Users observe IOMMU related errors when performing discard on nvme from
non-compliant nvme devices reading beyond the end of the DMA mapped
ranges to discard.

Two different variants of this behavior have been observed: SM22XX
controllers round up the read size to a multiple of 512 bytes, and Phison
E12 unconditionally reads the maximum discard size allowed by the spec
(256 segments or 4kB).

Make nvme_setup_discard unconditionally allocate the maximum DSM buffer
so the driver DMA maps a memory range that will always succeed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202665 many
Signed-off-by: Eduard Hasenleithner
[changelog, use existing define, kernel coding style]
Signed-off-by: Keith Busch
Signed-off-by: Sasha Levin

Eduard Hasenleithner
2019-12-31 23:45:24 +0800
46fab2db2 nvme: introduce "Command Aborted By host" status code ... Browse Code »

[ Upstream commit 2dc3947b53f573e8a75ea9cbec5588df88ca502e ]

Fix the status code of canceled requests initiated by the host according
to TP4028 (Status Code 0x371):
"Command Aborted By host: The command was aborted as a result of host
action (e.g., the host disconnected the Fabric connection)."

Also in a multipath environment, unless otherwise specified, errors of
this type (path related) should be retried using a different path, if
one is available.

Signed-off-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Max Gurtovoy
2019-12-31 23:44:40 +0800

18 Dec, 2019

2 commits

b49e676ce Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T" ... Browse Code »

commit 655e7aee1f0398602627a485f7dca6c29cc96cae upstream.

Since e045fa29e893 ("PCI/MSI: Fix incorrect MSI-X masking on resume") is
merged, we can revert the previous quirk now.

This reverts commit 19ea025e1d28c629b369c3532a85b3df478cc5c6.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Fixes: 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T")
Link: https://lore.kernel.org/r/20191031093408.9322-1-jian-hong@endlessm.com
Signed-off-by: Jian-Hong Pan
Signed-off-by: Bjorn Helgaas
Acked-by: Christoph Hellwig
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Jian-Hong Pan
2019-12-18 02:55:25 +0800
5ce4a36e0 nvme: Namepace identification descriptor list is optional ... Browse Code »

commit 22802bf742c25b1e2473c70b3b99da98af65ef4d upstream.

Despite NVM Express specification 1.3 requires a controller claiming to
be 1.3 or higher implement Identify CNS 03h (Namespace Identification
Descriptor list), the driver doesn't really need this identification in
order to use a namespace. The code had already documented in comments
that we're not to consider an error to this command.

Return success if the controller provided any response to an
namespace identification descriptors command.

Fixes: 538af88ea7d9de24 ("nvme: make nvme_report_ns_ids propagate error back")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205679
Reported-by: Ingo Brunberg
Cc: Sagi Grimberg
Cc: stable@vger.kernel.org # 5.4+
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch
Signed-off-by: Greg Kroah-Hartman

Keith Busch
2019-12-18 02:55:24 +0800

09 Nov, 2019

1 commit

5cb8418cb Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- Two NVMe device removal crash fixes, and a compat fixup for for an
ioctl that was introduced in this release (Anton, Charles, Max - via
Keith)

- Missing error path mutex unlock for drbd (Dan)

- cgroup writeback fixup on dead memcg (Tejun)

- blkcg online stats print fix (Tejun)

* tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
block: drbd: remove a stray unlock in __drbd_send_protocol()
blkcg: make blkcg_print_stat() print stats only for online blkgs
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload

Linus Torvalds
2019-11-09 10:15:55 +0800

05 Nov, 2019

2 commits

763303a83 nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths ... Browse Code »

nvme_mpath_clear_ctrl_paths() iterates through
the ctrl->namespaces list while holding ctrl->scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.

Specifically, nvme_scan_work() sorts ctrl->namespaces
AFTER unlocking scan_lock.

This may result in the following (rare) crash in ctrl disconnect
during scan_work:

BUG: kernel NULL pointer dereference, address: 0000000000000050
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
...
Call Trace:
nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
nvme_remove_namespaces+0x35/0xe0 [nvme_core]
nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
nvme_sysfs_delete+0x49/0x60 [nvme_core]
dev_attr_store+0x17/0x30
sysfs_kf_write+0x3e/0x50
kernfs_fop_write+0x11e/0x1a0
__vfs_write+0x1b/0x40
vfs_write+0xb9/0x1a0
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8d02bfb154

Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.

Alternative: sort ctrl->namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl->namespaces modification by anyone other than scan_work.

Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Anton Eidelman
Signed-off-by: Keith Busch

Anton Eidelman
2019-11-05 23:30:37 +0800
9ad9e8d6c nvme-rdma: fix a segmentation fault during module unload ... Browse Code »

In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.

Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: Sagi Grimberg
Signed-off-by: Max Gurtovoy
Signed-off-by: Keith Busch

Max Gurtovoy
2019-11-05 23:29:23 +0800

02 Nov, 2019

1 commit

1204c70d9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

2) Several leaks and other fixes in kTLS support of mlx5 driver, from
Tariq Toukan.

3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
Høiland-Jørgensen.

4) Add an r8152 device ID, from Kazutoshi Noguchi.

5) Missing include in ipv6's addrconf.c, from Ben Dooks.

6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
easily infer the 32-bit secret otherwise etc.

7) Several netdevice nesting depth fixes from Taehee Yoo.

8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
when doing lockless skb_queue_empty() checks, and accessing
sk_napi_id/sk_incoming_cpu lockless as well.

9) Fix jumbo packet handling in RXRPC, from David Howells.

10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

11) Fix DMA synchronization in gve driver, from Yangchun Fu.

12) Several bpf offload fixes, from Jakub Kicinski.

13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

14) Fix ping latency during high traffic rates in hisilicon driver, from
Jiangfent Xiao.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
net: fix installing orphaned programs
net: cls_bpf: fix NULL deref on offload filter removal
selftests: bpf: Skip write only files in debugfs
selftests: net: reuseport_dualstack: fix uninitalized parameter
r8169: fix wrong PHY ID issue with RTL8168dp
net: dsa: bcm_sf2: Fix IMP setup for port different than 8
net: phylink: Fix phylink_dbg() macro
gve: Fixes DMA synchronization.
inet: stop leaking jiffies on the wire
ixgbe: Remove duplicate clear_bit() call
Documentation: networking: device drivers: Remove stray asterisks
e1000: fix memory leaks
i40e: Fix receive buffer starvation for AF_XDP
igb: Fix constant media auto sense switching when no cable is connected
net: ethernet: arc: add the missed clk_disable_unprepare
igb: Enable media autosense for the i350.
igb/igc: Don't warn on fatal read failures when the device is removed
tcp: increase tcp_max_syn_backlog max value
net: increase SOMAXCONN to 4096
netdevsim: Fix use-after-free during device dismantle
...

Linus Torvalds
2019-11-02 08:48:11 +0800

29 Oct, 2019

3 commits

86cccfbf7 nvme-multipath: remove unused groups_only mode in ana log ... Browse Code »

groups_only mode in nvme_read_ana_log() is no longer used: remove it.

Reviewed-by: Sagi Grimberg
Signed-off-by: Anton Eidelman
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Anton Eidelman
2019-10-29 22:55:00 +0800
af8fd0424 nvme-multipath: fix possible io hang after ctrl reconnect ... Browse Code »

The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
This fails because ctrl disconnects.
Therefore nvme_update_ns_ana_state() is not called
and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
nvme_read_ana_log(ctrl, groups_only=true).
However, nvme_update_ana_state() does not update namespaces
because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.

Reviewed-by: Sagi Grimberg
Signed-off-by: Anton Eidelman
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Anton Eidelman
2019-10-29 22:55:00 +0800
3f926af3f net: use skb_queue_empty_lockless() in busy poll contexts ... Browse Code »

Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()

Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2019-10-29 04:33:41 +0800

18 Oct, 2019

1 commit

a4f40484e nvme-pci: Set the prp2 correctly when using more than 4k page ... Browse Code »

In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.

Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig
Signed-off-by: Kevin Hao
Signed-off-by: Keith Busch

Kevin Hao
2019-10-18 22:09:41 +0800

15 Oct, 2019

2 commits

28a4cac48 nvme-tcp: fix possible leakage during error flow ... Browse Code »

During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
since it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Max Gurtovoy
2019-10-15 21:47:29 +0800
5812d04c4 nvmet-loop: fix possible leakage during error flow ... Browse Code »

During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Max Gurtovoy
2019-10-15 21:47:28 +0800

14 Oct, 2019

6 commits

ac1c4e188 nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL ... Browse Code »

The access to sk->sk_ll_usec should be hidden behind
CONFIG_NET_RX_BUSY_POLL like the definition of sk_ll_usec.

Put access to ->sk_ll_usec behind CONFIG_NET_RX_BUSY_POLL.

Fixes: 1a9460cef5711 ("nvme-tcp: support simple polling")
Reviewed-by: Christoph Hellwig
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Keith Busch

Sebastian Andrzej Siewior
2019-10-14 22:27:01 +0800
c1ac9a4b0 nvme: Wait for reset state when required ... Browse Code »

Prevent simultaneous controller disabling/enabling tasks from interfering
with each other through a function to wait until the task successfully
transitioned the controller to the RESETTING state. This ensures disabling
the controller will not be interrupted by another reset path, otherwise
a concurrent reset may leave the controller in the wrong state.

Tested-by: Edmund Nadolski
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Keith Busch
2019-10-14 22:22:00 +0800
4c75f8778 nvme: Prevent resets during paused controller state ... Browse Code »

A paused controller is doing critical internal activation work in the
background. Prevent subsequent controller resets from occurring during
this period by setting the controller state to RESETTING first. A helper
function, nvme_try_sched_reset_work(), is introduced for these paths so
they may continue with scheduling the reset_work after they've completed
their uninterruptible critical section.

Tested-by: Edmund Nadolski
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Keith Busch
2019-10-14 22:21:54 +0800
92b98e88d nvme: Restart request timers in resetting state ... Browse Code »

A controller in the resetting state has not yet completed its recovery
actions. The pci and fc transports were already handling this, so update
the remaining transports to not attempt additional recovery in this
state. Instead, just restart the request timer.

Tested-by: Edmund Nadolski
Reviewed-by: James Smart
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Keith Busch
2019-10-14 22:21:49 +0800
5d02a5c1d nvme: Remove ADMIN_ONLY state ... Browse Code »

The admin only state was intended to fence off actions that don't
apply to a non-IO capable controller. The only actual user of this is
the scan_work, and pci was the only transport to ever set this state.
The consequence of having this state is placing an additional burden on
every other action that applies to both live and admin only controllers.

Remove the admin only state and place the admin only burden on the only
place that actually cares: scan_work.

This also prepares to make it easier to temporarily pause a LIVE state
so that we don't need to remember which state the controller had been in
prior to the pause.

Tested-by: Edmund Nadolski
Reviewed-by: James Smart
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Keith Busch
2019-10-14 22:21:44 +0800
770597ecb nvme-pci: Free tagset if no IO queues ... Browse Code »

If a controller becomes degraded after a reset, we will not be able to
perform any IO. We currently teardown previously created request
queues and namespaces, but we had kept the unusable tagset. Free
it after all queues using it have been released.

Tested-by: Edmund Nadolski
Reviewed-by: James Smart
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Keith Busch

Keith Busch
2019-10-14 22:21:38 +0800

05 Oct, 2019

2 commits

3a8ecc935 nvme: retain split access workaround for capability reads ... Browse Code »

Commit 7fd8930f26be4

"nvme: add a common helper to read Identify Controller data"

has re-introduced an issue that we have attempted to work around in the
past, in commit a310acd7a7ea ("NVMe: use split lo_hi_{read,write}q").

The problem is that some PCIe NVMe controllers do not implement 64-bit
outbound accesses correctly, which is why the commit above switched
to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in
the code.

In the mean time, the NVMe subsystem has been refactored, and now calls
into the PCIe support layer for NVMe via a .reg_read64() method, which
fails to use lo_hi_readq(), and thus reintroduces the problem that the
workaround above aimed to address.

Given that, at the moment, .reg_read64() is only used to read the
capability register [which is known to tolerate split reads], let's
switch .reg_read64() to lo_hi_readq() as well.

This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys
DesignWare PCIe host controller.

Fixes: 7fd8930f26be4 ("nvme: add a common helper to read Identify Controller data")
Signed-off-by: Ard Biesheuvel
Signed-off-by: Sagi Grimberg

Ard Biesheuvel
2019-10-05 08:10:12 +0800
6abff1b9f nvme: fix possible deadlock when nvme_update_formats fails ... Browse Code »

nvme_update_formats may fail to revalidate the namespace and
attempt to remove the namespace. This may lead to a deadlock
as nvme_ns_remove will attempt to acquire the subsystem lock
which is already acquired by the passthru command with effects.

Move the invalid namepsace removal to after the passthru command
releases the subsystem lock.

Reported-by: Judy Brock
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2019-10-05 08:10:12 +0800

28 Sep, 2019

2 commits

2d5ba0c71 Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus ... Browse Code »

Pull NVMe changes from Sagi:

"This set consists of various fixes and cleanups:
- controller removal race fix from Balbir
- quirk additions from Gabriel and Jian-Hong
- nvme-pci power state save fix from Mario
- Add 64bit user commands (for 64bit registers) from Marta
- nvme-rdma/nvme-tcp fixes from Max, Mark and Me
- Minor cleanups and nits from James, Dan and John"

* 'nvme-5.4' of git://git.infradead.org/nvme:
nvme-rdma: fix possible use-after-free in connect timeout
nvme: Move ctrl sqsize to generic space
nvme: Add ctrl attributes for queue_count and sqsize
nvme: allow 64-bit results in passthru commands
nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
nvmet-tcp: remove superflous check on request sgl
Added QUIRKs for ADATA XPG SX8200 Pro 512GB
nvme-rdma: Fix max_hw_sectors calculation
nvme: fix an error code in nvme_init_subsystem()
nvme-pci: Save PCI state before putting drive into deepest state
nvme-tcp: fix wrong stop condition in io_work
nvme-pci: Fix a race in controller removal
nvmet: change ppl to lpp

Jens Axboe
2019-09-28 03:17:37 +0800
67b483dd0 nvme-rdma: fix possible use-after-free in connect timeout ... Browse Code »

If the connect times out, we may have already destroyed the
queue in the timeout handler, so test if the queue is still
allocated in the connect error handler.

Reported-by: Yi Zhang
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2019-09-28 01:24:53 +0800

27 Sep, 2019

1 commit

f968688f4 nvme: Move ctrl sqsize to generic space ... Browse Code »

This isn't specific to fabrics.

Signed-off-by: Keith Busch
Signed-off-by: Sagi Grimberg

Keith Busch
2019-09-27 04:00:47 +0800

26 Sep, 2019

1 commit

2b1ff255d nvme: Add ctrl attributes for queue_count and sqsize ... Browse Code »

Current controller interrogation requires a lot of guesswork
on how many io queues were created and what the io sq size is.
The numbers are dependent upon core/fabric defaults, connect
arguments, and target responses.

Add sysfs attributes for queue_count and sqsize.

Signed-off-by: James Smart
Reviewed-by: Hannes Reinecke
Signed-off-by: Sagi Grimberg

James Smart
2019-09-26 04:01:44 +0800