Eric Lee / smarc-fsl-linux-kernel

11 Aug, 2016

1 commit

c21377f83 nvme: Suspend all queues before deletion ... Browse Code »

When nvme_delete_queue fails in the first pass of the
nvme_disable_io_queues() loop, we return early, failing to suspend all
of the IO queues. Later, on the nvme_pci_disable path, this causes us
to disable MSI without actually having freed all the IRQs, which
triggers the BUG_ON in free_msi_irqs(), as show below.

This patch refactors nvme_disable_io_queues to suspend all queues before
start submitting delete queue commands. This way, we ensure that we
have at least returned every IRQ before continuing with the removal
path.

[ 487.529200] kernel BUG at ../drivers/pci/msi.c:368!
cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650]
pc: c000000000627a50: free_msi_irqs+0x90/0x200
lr: c000000000627a40: free_msi_irqs+0x80/0x200
sp: c0000078c5b838d0
msr: 9000000100029033
current = 0xc0000078c5b40000
paca = 0xc000000002bd7600 softe: 0 irq_happened: 0x01
pid = 1376, comm = kworker/70:1H
kernel BUG at ../drivers/pci/msi.c:368!
Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413
(Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016
enter ? for help
[c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme]
[c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme]
[c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110
[c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170
[c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110
[c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0
[c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590
[c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660
[c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130
[c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c

Signed-off-by: Gabriel Krisman Bertazi
Cc: Brian King
Cc: Keith Busch
Cc: linux-nvme@lists.infradead.org
Signed-off-by: Jens Axboe

Gabriel Krisman Bertazi
2016-08-11 23:35:57 +0800

08 Aug, 2016

1 commit

d3f422c8d Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into for-linus ... Browse Code »

Sagi writes:

Mostly stability fixes for nvmet, rdma:
- fix uninitialized rdma_cm private data from Roland.
- rdma device removal handling (host and target).
- fix controller disconnect during active mounts.
- fix namespaces lost after fabric reconnects.
- remove redundant calls to namespace removal (rdma, loop).
- actually send controller shutdown when disconnecting.
- reconnect fixes (ns rescan and aen requeue)
- nvmet controller serial number inconsistency fix.

Jens Axboe
2016-08-08 21:42:42 +0800

04 Aug, 2016

5 commits

e3266378b nvme-rdma: Remove unused includes ... Browse Code »

Signed-off-by: Sagi Grimberg
Reviewed-by: Steve Wise

Sagi Grimberg
2016-08-04 22:47:54 +0800
3ef1b4b29 nvme-rdma: start async event handler after reconnecting to a controller ... Browse Code »

When we reset or reconnect to a controller, we are cancelling the
async event handler so we can safely re-establish resources, but we
need to remember to start it again when we successfully reconnect.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-04 22:45:35 +0800
28b891185 nvmet: Fix controller serial number inconsistency ... Browse Code »

The host is allowed to issue identify as many times
as it wants, we need to stay consistent when reporting
the serial number for a given controller.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-04 22:45:10 +0800
40e64e072 nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads ... Browse Code »

Under extreme conditions this might cause data corruptions. By doing that
we we repost the buffer and then post this buffer for the device to send.
If we happen to use shared receive queues the device might write to the
buffer before it sends it (there is no ordering between send and recv
queues). Without SRQs we probably won't get that if the host doesn't
mis-behave and send more than we allowed it, but relying on that is not
really a good idea.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-04 22:44:40 +0800
d8f7750a0 nvmet-rdma: Correctly handle RDMA device hot removal ... Browse Code »

When configuring a device attached listener, we may
see device removal events. In this case we return a
non-zero return code from the cm event handler which
implicitly destroys the cm_id. It is possible that in
the future the user will remove this listener and by
that trigger a second call to rdma_destroy_id on an
already destroyed cm_id -> BUG.

In addition, when a queue bound (active session) cm_id
generates a DEVICE_REMOVAL event we must guarantee all
resources are cleaned up by the time we return from the
event handler.

Introduce nvmet_rdma_device_removal which addresses
(or at least attempts to) both scenarios.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-04 22:43:06 +0800

03 Aug, 2016

8 commits

45862ebcc nvme-rdma: Make sure to shutdown the controller if we can ... Browse Code »

Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because
it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it).

Instead, check that the admin queue is connected. Note that it is safe
because we can never see a copmeting thread trying to destroy the admin
queue (reset or delete controller).

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-03 21:25:23 +0800
a159c64d9 nvme-loop: Remove duplicate call to nvme_remove_namespaces ... Browse Code »

nvme_uninit_ctrl already does that for us. Note that we
reordered nvme_loop_shutdown_ctrl with nvme_uninit_ctrl
but its safe because we want controller uninit to happen
before we shutdown the transport resources.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-03 21:25:19 +0800
a34ca17a9 nvme-rdma: Free the I/O tags when we delete the controller ... Browse Code »

If we wait until we free the controller (free_ctrl) we might
lose our rdma device without any notification while we still
have open resources (tags mrs and dma mappings).

Instead, destroy the tags with their rdma resources once we
delete the device and not when freeing it.

Note that we don't do that in nvme_rdma_shutdown_ctrl because
controller reset uses it as well and we want to give active I/O
a chance to complete successfully.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-03 21:25:16 +0800
2461a8dd3 nvme-rdma: Remove duplicate call to nvme_remove_namespaces ... Browse Code »

nvme_uninit_ctrl already does that for us. Note that we reordered
nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly
fine because we actually want ctrl uninit (aen, scan cancellation
and namespaces removal) to happen before we shutdown the rdma
resources.

Also, centralize the deletion work and the dead controller removal
work code duplication into __nvme_rdma_shutdown_ctrl that accepts
a shutdown boolean.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-03 21:25:11 +0800
57de5a0a4 nvme-rdma: Fix device removal handling ... Browse Code »

Device removal sequence may have crashed because the
controller (and admin queue space) was freed before
we destroyed the admin queue resources. Thus we
want to destroy the admin queue and only then queue
controller deletion and wait for it to complete.

More specifically we:
1. own the controller deletion (make sure we are not
competing with another deletion).
2. get rid of inflight reconnects if exists (which
also destroy and create queues).
3. destroy the queue.
4. safely queue controller deletion (and wait for it
to complete).

Reported-by: Steve Wise
Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-03 21:25:07 +0800
5f372eb3e nvme-rdma: Queue ns scanning after a sucessful reconnection ... Browse Code »

On an ordered target shutdown, the target can send a AEN on a namespace
removal, this will trigger the host to queue ns-list query. The shutdown
will trigger error recovery which will attepmt periodic reconnect.

We can hit a race where the ns rescanning fails (error recovery kicked
in and we're not connected) causing removing all the namespaces and when
we reconnect we won't see any namespaces for this controller.

So, queue a namespace rescan after we successfully reconnected to the target.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig

Sagi Grimberg
2016-08-03 21:25:03 +0800
0b857b44b nvme-rdma: Don't leak uninitialized memory in connect request private data ... Browse Code »

Zero out the full nvme_rdma_cm_req structure before sending it.
Otherwise we end up leaking kernel memory in the reserved field, which
might break forward compatibility in the future.

Signed-off-by: Roland Dreier
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Roland Dreier
2016-08-03 21:24:57 +0800
c8d0267ef Merge tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull PCI updates from Bjorn Helgaas:
"Highlights:

- ARM64 support for ACPI host bridges

- new drivers for Axis ARTPEC-6 and Marvell Aardvark

- new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx

- pci_resource_to_user() cleanup (more to come)

Detailed summary:

Enumeration:
- Move ecam.h to linux/include/pci-ecam.h (Jayachandran C)
- Add parent device field to ECAM struct pci_config_window (Jayachandran C)
- Add generic MCFG table handling (Tomasz Nowicki)
- Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki)
- Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki)

Resource management:
- Add devm_request_pci_bus_resources() (Bjorn Helgaas)
- Unify pci_resource_to_user() declarations (Bjorn Helgaas)
- Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas)
- Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas)
- Make PCI I/O space optional on ARM32 (Bjorn Helgaas)
- Ignore write combining when mapping I/O port space (Bjorn Helgaas)
- Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas)
- Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas)
- Support I/O resources when parsing host bridge resources (Jayachandran C)
- Add helpers to request/release memory and I/O regions (Johannes Thumshirn)
- Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn)
- Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5))
- Add generic pci_bus_claim_resources() (Lorenzo Pieralisi)
- Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
- Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi)
- Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya)
- Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu)

PCI device hotplug:
- Allow additional bus numbers for hotplug bridges (Keith Busch)
- Ignore interrupts during D3cold (Lukas Wunner)

Power management:
- Enforce type casting for pci_power_t (Andy Shevchenko)
- Don't clear d3cold_allowed for PCIe ports (Mika Westerberg)
- Put PCIe ports into D3 during suspend (Mika Westerberg)
- Power on bridges before scanning new devices (Mika Westerberg)
- Runtime resume bridge before rescan (Mika Westerberg)
- Add runtime PM support for PCIe ports (Mika Westerberg)
- Remove redundant check of pcie_set_clkpm (Shawn Lin)

Virtualization:
- Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra)
- Add DMA alias quirk for Adaptec 3805 (Alex Williamson)
- Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake)
- Add ACS quirk for Solarflare SFC9220 (Edward Cree)

MSI:
- Fix PCI_MSI dependencies (Arnd Bergmann)
- Add pci_msix_desc_addr() helper (Christoph Hellwig)
- Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig)
- Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig)
- Provide sensible IRQ vector alloc/free routines (Christoph Hellwig)
- Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig)

Error Handling:
- Bind DPC to Root Ports as well as Downstream Ports (Keith Busch)
- Remove DPC tristate module option (Keith Busch)
- Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg)

Generic host bridge driver:
- Select IRQ_DOMAIN (Arnd Bergmann)
- Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)

ACPI host bridge driver:
- Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki)
- Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki)
- Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki)
- Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki)

Altera host bridge driver:
- Check link status before retrain link (Ley Foon Tan)
- Poll for link up status after retraining the link (Ley Foon Tan)

Axis ARTPEC-6 host bridge driver:
- Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann)
- Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel)
- Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel)

Intel VMD host bridge driver:
- Use lock save/restore in interrupt enable path (Jon Derrick)
- Select device dma ops to override (Keith Busch)
- Initialize list item in IRQ disable (Keith Busch)
- Use x86_vector_domain as parent domain (Keith Busch)
- Separate MSI and MSI-X vector sharing (Keith Busch)

Marvell Aardvark host bridge driver:
- Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni)
- Add Aardvark PCI host controller driver (Thomas Petazzoni)
- Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni)

Microsoft Hyper-V host bridge driver:
- Fix interrupt cleanup path (Cathy Avery)
- Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
- Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov)

NVIDIA Tegra host bridge driver:
- Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren)
- Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren)
- Use lower-case hex consistently for register definitions (Thierry Reding)
- Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding)
- Stop setting pcibios_min_mem (Thierry Reding)

Renesas R-Car host bridge driver:
- Drop gen2 dummy I/O port region (Bjorn Helgaas)

TI DRA7xx host bridge driver:
- Fix return value in case of error (Christophe JAILLET)

Xilinx AXI host bridge driver:
- Fix return value in case of error (Christophe JAILLET)

Miscellaneous:
- Make bus_attr_resource_alignment static (Ben Dooks)
- Include for isa_dma_bridge_buggy (Ben Dooks)
- MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven)
- Make host bridge drivers explicitly non-modular (Paul Gortmaker)"

* tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits)
PCI: xgene: Make explicitly non-modular
PCI: thunder-pem: Make explicitly non-modular
PCI: thunder-ecam: Make explicitly non-modular
PCI: tegra: Make explicitly non-modular
PCI: rcar-gen2: Make explicitly non-modular
PCI: rcar: Make explicitly non-modular
PCI: mvebu: Make explicitly non-modular
PCI: layerscape: Make explicitly non-modular
PCI: keystone: Make explicitly non-modular
PCI: hisi: Make explicitly non-modular
PCI: generic: Make explicitly non-modular
PCI: designware-plat: Make it explicitly non-modular
PCI: artpec6: Make explicitly non-modular
PCI: armada8k: Make explicitly non-modular
PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency
PCI: Add ACS quirk for Solarflare SFC9220
arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700
PCI: aardvark: Add Aardvark PCI host controller driver
dt-bindings: add DT binding for the Aardvark PCIe controller
PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values
...

Linus Torvalds
2016-08-03 05:12:29 +0800

27 Jul, 2016

2 commits

3fc9d6909 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.

That said, this contains:

- separate secure erase type for the core block layer, from
Christoph.

- set of discard fixes, from Christoph.

- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.

- map and append request fixes from Christoph.

- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!

- nvme-loop fixes from Arnd.

- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.

- bcache fixes from Bhaktipriya and Yijing.

- cdrom subchannel read fix from Vchannaiah.

- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

- set of drbd updates and fixes from Fabian, Lars, and Philipp.

- mg_disk error path fix from Bart.

- user notification for failed device add for loop, from Minfei.

- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...

Linus Torvalds
2016-07-27 06:37:51 +0800
d05d7f407 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:

- the big change is the cleanup from Mike Christie, cleaning up our
uses of command types and modified flags. This is what will throw
some merge conflicts

- regression fix for the above for btrfs, from Vincent

- following up to the above, better packing of struct request from
Christoph

- a 2038 fix for blktrace from Arnd

- a few trivial/spelling fixes from Bart Van Assche

- a front merge check fix from Damien, which could cause issues on
SMR drives

- Atari partition fix from Gabriel

- convert cfq to highres timers, since jiffies isn't granular enough
for some devices these days. From Jan and Jeff

- CFQ priority boost fix idle classes, from me

- cleanup series from Ming, improving our bio/bvec iteration

- a direct issue fix for blk-mq from Omar

- fix for plug merging not involving the IO scheduler, like we do for
other types of merges. From Tahsin

- expose DAX type internally and through sysfs. From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
block: Fix front merge check
block: do not merge requests without consulting with io scheduler
block: Fix spelling in a source code comment
block: expose QUEUE_FLAG_DAX in sysfs
block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Btrfs: fix comparison in __btrfs_map_block()
block: atari: Return early for unsupported sector size
Doc: block: Fix a typo in queue-sysfs.txt
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
cfq-iosched: Fix regression in bonnie++ rewrite performance
cfq-iosched: Convert slice_resid from u64 to s64
block: Convert fifo_time from ulong to u64
blktrace: avoid using timespec
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
block: bio: kill BIO_MAX_SIZE
cfq-iosched: temporarily boost queue priority for idle classes
block: drbd: avoid to use BIO_MAX_SIZE
block: bio: remove BIO_MAX_SECTORS
...

Linus Torvalds
2016-07-27 06:03:07 +0800

21 Jul, 2016

4 commits

13880f5b5 nvme/pci: Provide SR-IOV support ... Browse Code »

This registers an sr-iov callback for nvme.

Signed-off-by: Keith Busch
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2016-07-21 11:29:59 +0800
fa9a89fc6 nvme: initialize variable before logical OR'ing it ... Browse Code »

It is typically not good coding or secure coding practice
to logical OR a variable without an initialization value first.
Here on this line:

integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;

BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable
never set to an initial value. This patch fixes that.

Signed-off-by: Jay Freyensee
Reviewed-by: Ming Lin
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jay Freyensee
2016-07-21 11:26:16 +0800
0c4de0f33 block: ensure bios return from blk_get_request are properly initialized ... Browse Code »

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API. Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:38:30 +0800
70246286e block: get rid of bio_rw and READA ... Browse Code »

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Mike Christie
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:37:01 +0800

14 Jul, 2016

3 commits

b09dcf585 NVMe: don't allocate unused nvme_major ... Browse Code »

When alloc_disk(0) is used, the ->major number is ignored. All device
numbers are allocated with a major of BLOCK_EXT_MAJOR.

So remove all references to nvme_major.

[akpm@linux-foundation.org: one unregister_blkdev() was missed]
Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@noble
Signed-off-by: NeilBrown
Reviewed-by: Keith Busch
Cc: Jens Axboe
Cc: Maxim Levitsky
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

NeilBrown
2016-07-14 23:52:56 +0800
32f0c4afb nvme: Remove RCU namespace protection ... Browse Code »

We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.

To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.

List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.

List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.

Reported-by: Ming Lin
[fixes 0bf77e9 nvme: switch to RCU freeing the namespace]
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2016-07-14 23:48:08 +0800
2fa843512 nvme: avoid crashes when node 0 is memoryless node. ... Browse Code »

When CONFIG_NUMA is enabled and node 0 is memoryless, the system
crashes because nvme_probe() sets the device->numa_node to 0 by
set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0.
To avoid the crash, we should change the 0 to first_memory_node.

Signed-off-by: Masayoshi Mizuma
Reviewed-by: Johannes Thumshirn
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Masayoshi Mizuma
2016-07-14 00:17:55 +0800

13 Jul, 2016

1 commit

f80ec966c nvme: Limit command retries ... Browse Code »

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.

Signed-off-by: Keith Busch
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Keith Busch
2016-07-13 07:20:31 +0800

12 Jul, 2016

7 commits

6eae8c452 nvme-loop: fix nvme-loop Kconfig dependencies ... Browse Code »

I ran into the same problem on NVME_TARGET_RDMA now,
which otherwise needs dependencies on both CONFIG_BLOCK and
CONFIGFS_FS:

warning: (NVME_TARGET_LOOP && NVME_TARGET_RDMA) selects NVME_TARGET which has unmet direct dependencies (BLOCK && CONFIGFS_FS)
0xA002B368 Mon Jul 11 18:00:45 CEST 2016 failed
In file included from ../drivers/nvme/target/core.c:16:0:
drivers/nvme/target/nvmet.h:222:14: error: field 'inline_bio' has incomplete type
struct bio inline_bio;
^~~~~~~~~~
drivers/nvme/target/core.c: In function 'nvmet_async_event_work':
drivers/nvme/target/core.c:98:3: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
kfree(aen);
^~~~~
../drivers/nvme/target/core.c: In function 'nvmet_ns_enable':
../drivers/nvme/target/core.c:269:13: error: implicit declaration of function 'blkdev_get_by_path' [-Werror=implicit-function-declaration]
ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ | FMODE_WRITE,

Folding in my patch below should address that too.

Signed-off-by: Arnd Bergmann
Reported-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Arnd Bergmann
2016-07-12 23:36:40 +0800
69555af2c nvmet: fix return value check in nvmet_subsys_alloc() ... Browse Code »

In case of error, the function kstrndup() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Signed-off-by: Wei Yongjun
Reviewed-by: Jay Freyensee
Signed-off-by: Jens Axboe

Wei Yongjun
2016-07-12 23:33:43 +0800
e76debd99 nvme-fabrics: add-remove ctrl repeat fix ... Browse Code »

Repeatedly adding then removing the same NVMe-over-Fabrics controller
over and over again (shown below) can cause a kernel crash (also shown
below). This patch fixes that.

[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
-nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
[nvmf]# ./remove_nvme_connections.sh 2
echo 1 > /sys/class/nvme/nvme0/delete_controller
echo 1 > /sys/class/nvme/nvme1/delete_controller
[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
Killed

[nvmf]# dmesg
[ 313.416908] nvme nvme0: creating 16 I/O queues.
[ 313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
192.168.1.100:4420
[ 313.524857] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[ 313.525262] IP: [] strcmp+0xe/0x30
[ 313.525490] PGD 0
[ 313.525726] Oops: 0000 [#1] SMP
[ 313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en
mlx4_ib ib_core mlx4_core
[ 313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
4.7.0-rc2+ #2
[ 313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
-F/IBQF/IBFF, BIOS 1.0a 10/09/2012
[ 313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
ffff88025b980000
[ 313.527879] RIP: 0010:[] []
strcmp+0xe/0x30
[ 313.528232] RSP: 0018:ffff88025b983db0 EFLAGS: 00010206
[ 313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
fffffffffffffff1
[ 313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
0000000000000011
[ 313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
ffff880471879058
[ 313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
ffff880471879800
[ 313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
fffffffffffffff8
[ 313.529303] FS: 00007f778f510700(0000) GS:ffff88047fbc0000(0000)
knlGS:0000000000000000
[ 313.529629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
00000000000406e0
[ 313.529989] Stack:
[ 313.530154] ffff88025b983e48 ffffffffa0171c74 0000000000000001
0000000000000059
[ 313.530621] ffff880476f32400 ffff88047e8add80 0000010074b33aa0
ffff880471879059
[ 313.531162] ffff88047187904b ffff880471879058 0000000000000000
ffff88047736e000
[ 313.531629] Call Trace:
[ 313.531797] [] nvmf_dev_write+0x674/0x840
[nvme_fabrics]
[ 313.531974] [] __vfs_write+0x23/0x120
[ 313.532146] [] ? __fd_install+0x1f/0xc0
[ 313.532316] [] ? __alloc_fd+0x3a/0x170
[ 313.532487] [] vfs_write+0xb3/0x1b0
[ 313.532658] [] ? filp_close+0x51/0x70
[ 313.532845] [] SyS_write+0x41/0xa0
[ 313.533016] []
entry_SYSCALL_64_fastpath+0x13/0x8f
[ 313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01
84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83
c7 01 b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31
[ 313.536563] RIP [] strcmp+0xe/0x30
[ 313.536815] RSP
[ 313.536981] CR2: 0000000000000010
[ 313.537151] ---[ end trace 3d952e590e7bc2d5 ]---

Reported-and-tested-by: Jay Freyensee
Signed-off-by: Ming Lin
Signed-off-by: Jay Freyensee
Signed-off-by: Jens Axboe

Ming Lin
2016-07-12 23:32:19 +0800
6a92967cc nvme-fabrics: Remove tl_retry_count ... Browse Code »

The timeout before error recovery logic kicks in is
dictated by the nvme keep-alive, so we don't really need
a transport layer retry count. transports can retry for
as much as they like.

Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-07-12 23:31:11 +0800
2ac17c283 nvme-rdma: Don't use tl_retry_count ... Browse Code »

Always use the maximum qp retry count as the
error recovery timeout is dictated from the nvme
keep-alive.

Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-07-12 23:31:10 +0800
458a9632a nvme-rdma: fix the return value of nvme_rdma_reinit_request() ... Browse Code »

PTR_ERR should be applied before its argument is reassigned, otherwise the
return value will be set to 0, not error code.

Signed-off-by: Wei Yongjun
Reviewed-by: Jay Freyensee
Signed-off-by: Jens Axboe

Wei Yongjun
2016-07-12 23:27:03 +0800
54adc0105 nvme/quirk: Add a delay before checking for adapter readiness ... Browse Code »

When disabling the controller, the specification says the register
NVME_REG_CC should be written and then driver needs to wait the
adapter to be ready, which is checked by reading another register
bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
so in case this timeout is reached the driver gives up and removes
the adapter from the system.

After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
(HGST adapter) end up being removed if we issue a reset_controller,
because driver keeps verifying the NVME_REG_CSTS until the timeout is
reached. This patch adds a necessary quirk for this adapter, by
introducing a delay before nvme_wait_ready(), so the reset procedure
is able to be completed. This quirk is needed because just increasing
the timeout is not enough in case of this adapter - the driver must
wait before start reading NVME_REG_CSTS register on this specific
device.

Signed-off-by: Guilherme G. Piccoli
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Guilherme G. Piccoli
2016-07-12 23:23:00 +0800

09 Jul, 2016

1 commit

41d512e51 Merge branch 'for-4.8/block' of git://git.kernel.org/pub/scm/linux/kernel/git/nv… ... Browse Code »

…dimm/nvdimm into for-4.8/drivers

Dan writes:

"The removal of ->driverfs_dev in favor of just passing the parent
device in as a parameter to add_disk(). See below, it has received a
"Reviewed-by" from Christoph, Bart, and Johannes.

It is also a pre-requisite for Fam Zheng's work to cleanup gendisk
uevents vs attribute visibility [1]. We would extend device_add_disk()
to take an attribute_group list.

This is based off a branch of block.git/for-4.8/drivers and has
received a positive build success notification from the kbuild robot
across several configs.

[1]: "gendisk: Generate uevent after attribute available"
http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"

Jens Axboe
2016-07-09 06:04:11 +0800

08 Jul, 2016

3 commits

711023071 nvme-rdma: add a NVMe over Fabrics RDMA host driver ... Browse Code »

This patch implements the RDMA host (initiator in SCSI speak) driver. It
can be used to connect to remote NVMe over Fabrics controllers over
Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well
a the new fabrics library.

To connect to all NVMe over Fabrics controller reachable on a given taget
port using RDMA/CM use the following command:

nvme connect-all -t rdma -a $IPADDR

This requires the latest version of nvme-cli with Fabrics support.

Signed-off-by: Jay Freyensee
Signed-off-by: Ming Lin
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Tested-by: Steve Wise
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-08 22:38:49 +0800
8f000cac6 nvmet-rdma: add a NVMe over Fabrics RDMA target driver ... Browse Code »

This patch implements the RDMA transport for the NVMe over Fabrics target,
which allows exporting NVMe over Fabrics functionality over RDMA fabrics
(Infiniband, RoCE, iWARP).

All NVMe logic is in the generic target and this module just provides a
small glue between it and the generic code in the RDMA subsystem.

Signed-off-by: Armen Baloyan ,
Signed-off-by: Jay Freyensee
Signed-off-by: Ming Lin
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Tested-by: Steve Wise
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-08 22:38:49 +0800
def61eca9 nvme: add new reconnecting controller state ... Browse Code »

The nvme fabric (RDMA, FC, etc...) can introduce port, link or node
failures that may require a reconnect to re-establish the connection.

Add a new reconnecting state that will initially be used by the RDMA
driver.

Reviewed-by: Jay Freyensee
Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Tested-by: Steve Wise
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-08 22:38:49 +0800

07 Jul, 2016

3 commits

6f9297021 nvme: lightnvm: make MLC num_pairs little endian ... Browse Code »

According to the OpenChannel SSD interface specification the NAND flash
MLC page pairing information's number of page page pairings field is the
first two bytes in the MLC Page Pairing data structure. The hardware's
data structure itself is little endian so annotate it as such, like the
rest of lighnvm's data structures.

Signed-off-by: Johannes Thumshirn
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Johannes Thumshirn
2016-07-07 22:51:52 +0800
f98d9ca17 nvmet: fix an error code ... Browse Code »

We accidentally return zero here when ERR_PTR(-ENOMEM) is intended.

Fixes: a07b4970f464 ('nvmet: add a generic NVMe target')
Signed-off-by: Dan Carpenter
Signed-off-by: Jens Axboe

Dan Carpenter
2016-07-07 22:37:36 +0800
1fb470408 nvme-loop: add configfs dependency ... Browse Code »

CONFIG_NVME_TARGET has a correct CONFIG_CONFIGFS_FS dependency, but the
newly added NVME_TARGET_LOOP is missing this, resulting in a link
failure:

drivers/nvme/built-in.o: In function `nvmet_init_configfs':
loop.c:(.init.text+0x2a0): undefined reference to `config_group_init'
loop.c:(.init.text+0x2c0): undefined reference to `config_group_init_type_name'
loop.c:(.init.text+0x318): undefined reference to `configfs_register_subsystem'
drivers/nvme/built-in.o: In function `nvmet_exit_configfs':
loop.c:(.exit.text+0x9c): undefined reference to `configfs_unregister_subsystem'

This adds the same dependency here.

Signed-off-by: Arnd Bergmann
Fixes: 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver")
Signed-off-by: Jens Axboe

Arnd Bergmann
2016-07-07 22:34:45 +0800

06 Jul, 2016

1 commit

3a85a5de2 nvme-loop: add a NVMe loopback host driver ... Browse Code »

This patch implements adds nvme-loop which allows to access local devices
exported as NVMe over Fabrics namespaces. This module can be useful for
easy evaluation, testing and also feature experimentation.

To createa nvme-loop device you need to configure the NVMe target to
export a loop port (see the nvmetcli documentaton for that) and then
connect to it using

nvme connect-all -t loop

which requires the very latest nvme-cli version with Fabrics support.

Signed-off-by: Jay Freyensee
Signed-off-by: Ming Lin
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-06 01:30:36 +0800