Eric Lee / smarc-fsl-linux-kernel

22 Mar, 2017

2 commits

945dd5bac nvme-loop: handle cpu unplug when re-establishing the controller ... Browse Code »

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.

Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2017-03-22 00:38:46 +0800
c248c6438 nvme-rdma: handle cpu unplug when re-establishing the controller ... Browse Code »

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.

Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2017-03-22 00:38:41 +0800

17 Mar, 2017

3 commits

b25634e2a nvmet-rdma: Fix a possible uninitialized variable dereference ... Browse Code »

When handling a new recv command, we grab a new rsp resource and
check for the queue state being live. In case the queue is not in
live state, we simply restore the rsp back to the free list. However
in this flow we didn't set rsp->queue yet, so we cannot dereference it.

Instead, make sure to initialize rsp->queue (and other rsp members)
as soon as possible so we won't reference uninitialized variables.

Reported-by: Yi Zhang
Reported-by: Raju Rangoju
Reviewed-by: Christoph Hellwig
Tested-by: Raju Rangoju
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2017-03-17 00:41:24 +0800
d11ea004a nvmet: confirm sq percpu has scheduled and switched to atomic ... Browse Code »

percpu_ref_kill is not enough to prevent subsequent
percpu_ref_tryget_live from failing. Hence call
perfcpu_ref_kill_confirm to make it safe.

Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2017-03-17 00:41:21 +0800
e4c5d3762 nvme-loop: fix a possible use-after-free when destroying the admin queue ... Browse Code »

we need to destroy the nvmet sq and let it finish gracefully
before continue to cleanup the queue.

Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2017-03-17 00:41:17 +0800

04 Mar, 2017

1 commit

e0d072250 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer fixes from Jens Axboe:
"A collection of fixes for this merge window, either fixes for existing
issues, or parts that were waiting for acks to come in. This pull
request contains:

- Allocation of nvme queues on the right node from Shaohua.

This was ready long before the merge window, but waiting on an ack
from Bjorn on the PCI bit. Now that we have that, the three patches
can go in.

- Two fixes for blk-mq-sched with nvmeof, which uses hctx specific
request allocations. This caused an oops. One part from Sagi, one
part from Omar.

- A loop partition scan deadlock fix from Omar, fixing a regression
in this merge window.

- A three-patch series from Keith, closing up a hole on clearing out
requests on shutdown/resume.

- A stable fix for nbd from Josef, fixing a leak of sockets.

- Two fixes for a regression in this window from Jan, fixing a
problem with one of his earlier patches dealing with queue vs bdi
life times.

- A fix for a regression with virtio-blk, causing an IO stall if
scheduling is used. From me.

- A fix for an io context lock ordering problem. From me"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: Move bdi_unregister() to del_gendisk()
blk-mq: ensure that bd->last is always set correctly
block: don't call ioc_exit_icq() with the queue lock held for blk-mq
block: Initialize bd_bdi on inode initialization
loop: fix LO_FLAGS_PARTSCAN hang
nvme: Complete all stuck requests
blk-mq: Provide freeze queue timeout
blk-mq: Export blk_mq_freeze_queue_wait
nbd: stop leaking sockets
blk-mq: move update of tags->rqs to __blk_mq_alloc_request()
blk-mq: kill blk_mq_set_alloc_data()
blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset
nvme: allocate nvme_queue in correct node
PCI: add an API to get node from vector
blk-mq: allocate blk_mq_tags and requests in correct node

Linus Torvalds
2017-03-04 02:53:35 +0800

02 Mar, 2017

3 commits

302ad8cc0 nvme: Complete all stuck requests ... Browse Code »

If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.

Signed-off-by: Keith Busch
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Keith Busch
2017-03-02 23:56:59 +0800
d3af3ecdc nvme: allocate nvme_queue in correct node ... Browse Code »

nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq
will use it.

Signed-off-by: Shaohua Li
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Shaohua Li
2017-03-02 23:56:04 +0800
b2d091031 sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> ... Browse Code »

We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:38 +0800

28 Feb, 2017

1 commit

b43daedc0 scripts/spelling.txt: add "embeded" pattern and fix typo instances ... Browse Code »

Fix typos and add the following to the scripts/spelling.txt:

embeded||embedded

Link: http://lkml.kernel.org/r/1481573103-11329-12-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2017-02-28 10:43:47 +0800

26 Feb, 2017

1 commit

ac1820fb2 Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma ... Browse Code »

Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.

Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.

This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...

Linus Torvalds
2017-02-26 05:45:43 +0800

24 Feb, 2017

1 commit

e286bcfc5 nvme/pci: re-check security protocol support after reset ... Browse Code »

A device may change capabilities after each reset, e.g. due to a firmware
upgrade. We should thus check for Security Send/Receive and OPAL support
after each reset.

Based on patches from Christoph and Keith.

Signed-off-by: Scott Bauer
Signed-off-by: Jens Axboe

Scott Bauer
2017-02-24 02:55:43 +0800

23 Feb, 2017

19 commits

124298bd0 nvme: detect NVMe controller in recent MacBooks ... Browse Code »

Adds support for detection of the NVMe controller found in the
following recent MacBooks:
- Retina MacBook 2016 (MacBook9,1)
- 13" MacBook Pro 2016 without Touch Bar (MacBook13,1)
- 13" MacBook Pro 2016 with Touch Bar (MacBook13,2)

Signed-off-by: Daniel Roschka
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Daniel Roschka
2017-02-23 06:17:29 +0800
8f4e8dace nvme-rdma: add support for host_traddr ... Browse Code »

This will enable the user to control the specific interface for
connection establishment in case the host has more than 1 interface
under the same subnet.
E.g:
Host interfaces configured as:
- ib0 1.1.1.1/16
- ib1 1.1.1.2/16

Target interfaces configured as:
- ib0 1.1.1.3/16 (listener interface)
- ib1 1.1.1.4/16

the following connect command will go through host iface ib0 (default):
nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023

but the following command will go through host iface ib1:
nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023 -w 1.1.1.2

Signed-off-by: Max Gurtovoy
Reviewed-by: Parav Pandit
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Max Gurtovoy
2017-02-23 04:34:00 +0800
6ccaeb560 nvmet-rdma: Fix error handling ... Browse Code »

According to the preceeding goto, it is likely that 'out_destroy_sq' was
expected here.

Signed-off-by: Christophe JAILLET
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Christophe JAILLET
2017-02-23 04:34:00 +0800
7a01a6ea2 nvmet-rdma: use nvme cm status helper ... Browse Code »

Also remove redundant debug prints.

Signed-off-by: Max Gurtovoy
Reviewed-by: Parav Pandit
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Max Gurtovoy
2017-02-23 04:34:00 +0800
3ee80c3d1 nvme-rdma: move nvme cm status helper to .h file ... Browse Code »

This will enable the usage for nvme rdma target.
Also move from a lookup array to a switch statement.

Signed-off-by: Max Gurtovoy
Reviewed-by: Parav Pandit
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Max Gurtovoy
2017-02-23 04:34:00 +0800
faef3af69 nvme-fc: don't bother to validate ioccsz and iorcsz ... Browse Code »

Discovery controllers don't set the values. They are in reserved
areas of the Identify Controller data structure.

Given the cmd completed, the minimal capsule sizes are supported,
so no need to check nqn to detect discovery controllers and
special case validations.

Signed-off-by: James Smart
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

James Smart
2017-02-23 04:34:00 +0800
9ef3932e2 nvme/pci: No special case for queue busy on IO ... Browse Code »

This driver previously required we have a special check for IO submitted
to nvme IO queues that are temporarily suspended. That is no longer
necessary since blk-mq provides a quiesce, so any IO that actually gets
submitted to such a queue must be ended since the queue isn't going to
start back up.

This is fixing a condition where we have fewer IO queues after a
controller reset. This may happen if the number of CPU's has changed,
or controller firmware update changed the queue count, for example.

While it may be possible to complete the IO on a different queue, the
block layer does not provide a way to resubmit a request on a different
hardware context once the request has entered the queue. We don't want
these requests to be stuck indefinitely either, so ending them in error
is our only option at the moment.

Signed-off-by: Keith Busch
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Keith Busch
2017-02-23 04:34:00 +0800
f33447b90 nvme/core: Fix race kicking freed request_queue ... Browse Code »

If a namespace has already been marked dead, we don't want to kick the
request_queue again since we may have just freed it from another thread.

Signed-off-by: Keith Busch
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Keith Busch
2017-02-23 04:34:00 +0800
6db28eda2 nvme/pci: Disable on removal when disconnected ... Browse Code »

If the device is not present, the driver should disable the queues
immediately. Prior to this, the driver was relying on the watchdog timer
to kill the queues if requests were outstanding to the device, and that
just delays removal up to one second.

Signed-off-by: Keith Busch
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Keith Busch
2017-02-23 04:34:00 +0800
c5552fde1 nvme: Enable autonomous power state transitions ... Browse Code »

NVMe devices can advertise multiple power states. These states can
be either "operational" (the device is fully functional but possibly
slow) or "non-operational" (the device is asleep until woken up).
Some devices can automatically enter a non-operational state when
idle for a specified amount of time and then automatically wake back
up when needed.

The hardware configuration is a table. For each state, an entry in
the table indicates the next deeper non-operational state, if any,
to autonomously transition to and the idle time required before
transitioning.

This patch teaches the driver to program APST so that each successive
non-operational state will be entered after an idle time equal to 100%
of the total latency (entry plus exit) associated with that state.
The maximum acceptable latency is controlled using dev_pm_qos
(e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
states with total latency greater than this value will not be used.
As a special case, setting the latency tolerance to 0 will disable
APST entirely. On hardware without APST support, the sysfs file will
not be exposed.

The latency tolerance for newly-probed devices is set by the module
parameter nvme_core.default_ps_max_latency_us.

In theory, the device can expose "default" APST table, but this
doesn't seem to function correctly on my device (Samsung 950), nor
does it seem particularly useful. There is also an optional
mechanism by which a configuration can be "saved" so it will be
automatically loaded on reset. This can be configured from
userspace, but it doesn't seem useful to support in the driver.

On my laptop, enabling APST seems to save nearly 1W.

The hardware tables can be decoded in userspace with nvme-cli.
'nvme id-ctrl /dev/nvmeN' will show the power state table and
'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
configuration.

This feature is quirked off on a known-buggy Samsung device.

Signed-off-by: Andy Lutomirski
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Andy Lutomirski
2017-02-23 04:34:00 +0800
bd4da3aba nvme: Add a quirk mechanism that uses identify_ctrl ... Browse Code »

Currently, all NVMe quirks are based on PCI IDs. Add a mechanism to
define quirks based on identify_ctrl's vendor id, model number,
and/or firmware revision.

Reviewed-by: Christoph Hellwig
Signed-off-by: Andy Lutomirski
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Andy Lutomirski
2017-02-23 04:34:00 +0800
e5a39dd82 nvme: make nvmf_register_transport require a create_ctrl callback ... Browse Code »

nvmf_create_ctrl() relys on the presence of a create_crtl callback in the
registered nvmf_transport_ops, so make nvmf_register_transport require one.

Update the available call-sites as well to reflect these changes.

Signed-off-by: Johannes Thumshirn
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Johannes Thumshirn
2017-02-23 04:34:00 +0800
986994a27 nvme: Use CNS as 8-bit field and avoid endianness conversion ... Browse Code »

This patch defines CNS field as 8-bit field and avoids cpu_to/from_le
conversions.
Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS
for readability (don't rely on the fact that NVME_ID_CNS_NS = 0).

Reviewed-by: Max Gurtovoy
Signed-off-by: Parav Pandit
Reviewed-by: Keith Busch
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Parav Pandit
2017-02-23 04:34:00 +0800
778f067c1 nvme: add semicolon in nvme_command setting ... Browse Code »

Reviewed-by: Parav Pandit
Signed-off-by: Max Gurtovoy
Reviewed-by: Keith Busch
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Max Gurtovoy
2017-02-23 04:34:00 +0800
2dbf5816b nvmet: avoid dereferencing nvmet_req ... Browse Code »

No need to dereference req twice to get the cmd when we already
have it stored in a local variable.

Signed-off-by: Max Gurtovoy
Reviewed-by: Parav Pandit
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Max Gurtovoy
2017-02-23 04:34:00 +0800
8432bdb29 nvme: Make controller state visible via sysfs ... Browse Code »

Easier for debugging and testing state machine
transitions.

Signed-off-by: Sagi Grimberg
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Sagi Grimberg
2017-02-23 04:34:00 +0800
15fbad96f nvmet: Make cntlid globally unique ... Browse Code »

We usually log the cntlid which is confusing in case
we have multiple subsystems each with it's own cntlid ida.
Instead make cntlid ida globally unique and log the initial
association.

Signed-off-by: Sagi Grimberg
Reviewed-by: Johannes Thumshirn
Reviewed-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Sagi Grimberg
2017-02-23 04:34:00 +0800
f64935abb nvmet_fc: cleanup of abort flag processing in fcp_op_done ... Browse Code »

Cleanup of abort flag processing in fcp_op_done.
References were unnecessary

Signed-off-by: James Smart
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

James Smart
2017-02-23 04:34:00 +0800
b38b90549 nvme: admin-cmd: fix spelling mistake: "Counld" -> "Could" ... Browse Code »

trivial fix to spelling mistake in pr_err message

Signed-off-by: Colin Ian King
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Colin Ian King
2017-02-23 04:34:00 +0800

18 Feb, 2017

4 commits

818551e2b Merge branch 'for-4.11/next' into for-4.11/linus-merge ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2017-02-18 05:08:19 +0800
6010720da Merge branch 'for-4.11/block' into for-4.11/linus-merge ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2017-02-18 05:06:45 +0800
8a9ae5232 nvme: Check for Security send/recv support before issuing commands. ... Browse Code »

We need to verify that the controller supports the security
commands before actually trying to issue them.

Signed-off-by: Scott Bauer
[hch: moved the check so that we don't call into the OPAL code if not
supported]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Scott Bauer
2017-02-18 03:41:49 +0800
4f1244c82 block/sed-opal: allocate struct opal_dev dynamically ... Browse Code »

Insted of bloating the containing structure with it all the time this
allocates struct opal_dev dynamically. Additionally this allows moving
the definition of struct opal_dev into sed-opal.c. For this a new
private data field is added to it that is passed to the send/receive
callback. After that a lot of internals can be made private as well.

Signed-off-by: Christoph Hellwig
Tested-by: Scott Bauer
Reviewed-by: Scott Bauer
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-18 03:41:47 +0800

15 Feb, 2017

1 commit

e225c20eb Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN ... Browse Code »

When CONFIG_KASAN is enabled, compilation fails:

block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()

Fixes: 455a7b238cd6 ("block: Add Sed-opal library")

Reported-by: Arnd Bergmann
Signed-off-by: Scott Bauer
Signed-off-by: Jens Axboe

Scott Bauer
2017-02-15 10:47:18 +0800

09 Feb, 2017

1 commit

b35ba01ea nvme: support ranged discard requests ... Browse Code »

NVMe supports up to 256 ranges per DSM command, so wire up support
for ranged discards up to that limit.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-09 04:43:10 +0800

07 Feb, 2017

1 commit

a98e58e54 nvme: Add Support for Opal: Unlock from S3 & Opal Allocation/Ioctls ... Browse Code »

This patch implements the necessary logic to unlock an Opal
enabled device coming back from an S3.

The patch also implements the SED/Opal allocation necessary to support
the opal ioctls.

Signed-off-by: Scott Bauer
Signed-off-by: Jens Axboe

Scott Bauer
2017-02-07 00:44:21 +0800

01 Feb, 2017

2 commits

aebf526b5 block: fold cmd_type into the REQ_OP_ space ... Browse Code »

Instead of keeping two levels of indirection for requests types, fold it
all into the operations. The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-01 05:00:44 +0800
57292b58d block: introduce blk_rq_is_passthrough ... Browse Code »

This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-01 05:00:34 +0800