Eric Lee / smarc-fsl-linux-kernel

09 Dec, 2016

3 commits

be07e14f9 blk-wbt: don't throttle discard or write zeroes ... Browse Code »

Both of these are metadata only commands that are not issued by the
writeback code and not directly relevant to the writeback bandwith.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-12-09 23:29:35 +0800
a897b6664 nbd: use dev_err_ratelimited in io path ... Browse Code »

While doing stress tests we noticed that we'd get a lot of dmesg spam if
we suddenly disconnected the nbd device out of band. Rate limit the
messages in the io path in order to deal with this.

Signed-off-by: Josef Bacik
Signed-off-by: Jens Axboe

Josef Bacik
2016-12-09 06:28:09 +0800
20032ec38 nbd: reset the setup task for NBD_CLEAR_SOCK ... Browse Code »

If an app exits before running NBD_DO_IT but after adding sockets we can
end up not being allowed to do a new nbd device. Fix this by making
NBD_CLEAR_SOCK reset the setup_task.

Signed-off-by: Josef Bacik
Signed-off-by: Jens Axboe

Josef Bacik
2016-12-09 06:26:57 +0800

06 Dec, 2016

22 commits

d65cfe909 Merge branch 'nvmf-4.10' of git://git.infradead.org/nvme-fabrics into for-4.10/block ... Browse Code »

Sagi writes:

The major addition here is the nvme FC transport implementation
from James.

What else:
- some cleanups and memory leak fixes in the host side fabrics code from Bart
- possible rcu violation fix from Sasha
- logging change from Max
- small include cleanup

Jens Axboe
2016-12-06 23:06:19 +0800
475d0fe79 nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME ... Browse Code »

Add FC LLDD loopback driver to test FC host and target transport within
nvme-fabrics

To aid in the development and testing of the lower-level api of the FC
transport, this loopback driver has been created to act as if it were a
FC hba driver supporting both the host interfaces as well as the target
interfaces with the nvme FC transport.

Signed-off-by: James Smart
Reviewed-by: Jay Freyensee
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 21:51:48 +0800
c53432030 nvme-fabrics: Add target support for FC transport ... Browse Code »

Implements the FC-NVME T11 definition of how nvme fabric capsules are
performed on an FC fabric. Utilizes a lower-layer API to FC host adapters
to send/receive FC-4 LS operations and perform the FCP transactions
necessary to perform and FCP IO request for NVME.

The T11 definitions for FC-4 Link Services are implemented which create
NVMeOF connections. Implements the hooks with nvmet layer to pass NVME
commands to it for processing and posting of data/response base to the
host via the different connections.

Signed-off-by: James Smart
Reviewed-by: Jay Freyensee
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:56 +0800
e399441de nvme-fabrics: Add host support for FC transport ... Browse Code »

Implements the FC-NVME T11 definition of how nvme fabric capsules are
performed on an FC fabric. Utilizes a lower-layer API to FC host adapters
to send/receive FC-4 LS operations and FCP operations that comprise NVME
over FC operation.

The T11 definitions for FC-4 Link Services are implemented which create
NVMeOF connections. Implements the hooks with blk-mq to then submit admin
and io requests to the different connections.

Signed-off-by: James Smart
Reviewed-by: Jay Freyensee
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:56 +0800
d6d20012e nvme-fabrics: Add FC transport LLDD api definitions ... Browse Code »

Host:
- LLDD registration with the host transport
- registering host ports (local ports) and target ports seen on
fabric (remote ports)
- Data structures and call points for FC-4 LS's and FCP IO requests

Target:
- LLDD registration with the target transport
- registering nvme subsystem ports (target ports)
- Data structures and call points for reception of FC-4 LS's and
FCP IO requests, and callbacks to perform data and rsp transfers
for the io.

Add to MAINTAINERS file

Signed-off-by: James Smart
Reviewed-by: Christoph Hellwig
Reviewed-by: Jay Freyensee
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:56 +0800
b1ad1475b nvme-fabrics: Add FC transport FC-NVME definitions ... Browse Code »

- Formats for Cmd, Data, Rsp IUs
- Formats FC-4 LS definitions
- Add to MAINTAINERS file

Signed-off-by: James Smart
Reviewed-by: Christoph Hellwig
Reviewed-by: Jay Freyensee
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:56 +0800
cba3bdfd2 nvme-fabrics: Add FC transport error codes to nvme.h ... Browse Code »

Signed-off-by: James Smart
Reviewed-by: Christoph Hellwig
Reviewed-by: Jay Freyensee
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:56 +0800
6ea76f33e Add type 0x28 NVME type code to scsi fc headers ... Browse Code »

Signed-off-by: James Smart
Acked-by: Johannes Thumshirn
Reviewed-by: Jay Freyensee
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:03 +0800
885aa4015 nvme-fabrics: patch target code in prep for FC transport support ... Browse Code »

- Add FC transport type decoding
- Add FC address family decoding

Signed-off-by: James Smart
Acked-by: Johannes Thumshirn
Reviewed-by: Jay Freyensee
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

James Smart
2016-12-06 16:17:03 +0800
721b3917c nvme-fabrics: set sqe.command_id in core not transports ... Browse Code »

Currently, core.c sets command_id only on rd/wr commands, leaving it to
the transport to set it again to ensure the request had a command id.

Move location of set in core so applies to all commands.
Remove transport sets.

Signed-off-by: James Smart
Reviewed-by: Jay Freyensee
Signed-off-by: Sagi Grimberg

James Smart
2016-12-06 16:17:03 +0800
a317178e3 parser: add u64 number parser ... Browse Code »

Will be used by the nvme-fabrics FC transport in parsing options

Signed-off-by: James Smart
Signed-off-by: Sagi Grimberg

James Smart
2016-12-06 16:17:03 +0800
27a4beef0 nvme-rdma: align to generic ib_event logging helper ... Browse Code »

Signed-off-by: Max Gurtovoy
Reviewed-by: Jay Freyensee
Signed-off-by: Christoph Hellwig

Max Gurtovoy
2016-12-06 16:17:03 +0800
675796be4 nvmet-rdma: align to generic ib_event logging helper ... Browse Code »

Signed-off-by: Max Gurtovoy
Signed-off-by: Christoph Hellwig

Max Gurtovoy
2016-12-06 16:17:03 +0800
d4a5340ed nvme-rdma: remove redundant define ... Browse Code »

Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Sagi Grimberg
2016-12-06 16:17:03 +0800
6eb728305 nvme-fabrics: Adjust source code indentation ... Browse Code »

Adjust indentation such that arguments are aligned.

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Bart Van Assche
2016-12-06 16:17:03 +0800
6bcb5268d nvme/scsi: Remove set-but-not-used variables ... Browse Code »

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Bart Van Assche
2016-12-06 16:17:03 +0800
e4fcf07cc nvmet: Fix possible infinite loop triggered on hot namespace removal ... Browse Code »

When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.

The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.

By changing that, we cannot rely on ns->dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.

Signed-off-by: Sagi Grimberg
Signed-off-by: Solganik Alexander
Cc: # v4.8+

Solganik Alexander
2016-12-06 16:17:03 +0800
f3116d8f1 nvme-fabrics: Fix a memory leak in an nvmf_create_ctrl() error path ... Browse Code »

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Bart Van Assche
2016-12-06 16:17:03 +0800
8eadfcb1b nvme-fabrics: Fix memory leaks in nvmf_parse_options() ... Browse Code »

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Sagi Grimberg

Bart Van Assche
2016-12-06 16:17:03 +0800
76c08bf46 nvme-rdma: force queue size to respect controller capability ... Browse Code »

Queue size needs to respect the Maximum Queue Entries Supported advertised by
the controller in its Capability register.

Signed-off-by: Samuel Jones
Reviewed-by: Christoph Hellwig
[sagig: fixed queue_size adjustment according to
Daniel Verkamp comment]
Signed-off-by: Sagi Grimberg

Samuel Jones
2016-12-06 16:17:03 +0800
70d4281c4 nvmet-rdma: Fix REJ status code ... Browse Code »

nvmet_sq_init() returns a value
Reviewed-by: Sagi Grimberg
Signed-off-by: Sagi Grimberg

Bart Van Assche
2016-12-06 16:17:03 +0800
6e85eaf30 blk-mq: blk_account_io_start() takes a bool ... Browse Code »

Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn

Jens Axboe
2016-12-06 03:14:28 +0800

05 Dec, 2016

1 commit

58886785d block: fix unintended fallthrough in generic_make_request_checks() ... Browse Code »

Since commit e73c23ff736e ("block: add async variant of
blkdev_issue_zeroout") messages like the following show up:

EXT4-fs (dm-1): Delayed block allocation failed for inode 2368848 at
logical offset 0 with max blocks 1 with error 95
EXT4-fs (dm-1): This should not happen!! Data will be lost

Due to the following fallthrough introduced with
commit 2d253440b5af ("block: Define zoned block device operations"),
generic_make_request_checks() would accept a REQ_OP_WRITE_SAME bio only
if the block device supports "write same" *and* is a zoned one:

switch (bio_op(bio)) {
[...]
case REQ_OP_WRITE_SAME:
if (!bdev_write_same(bio->bi_bdev))
goto not_supported;
case REQ_OP_ZONE_REPORT:
case REQ_OP_ZONE_RESET:
if (!bdev_is_zoned(bio->bi_bdev))
goto not_supported;
break;
[...]
}

Thus, although the bio setup as done by __blkdev_issue_write_same() from
commit e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
would succeed, its actual submission would not, resulting in the
EOPNOTSUPP == 95.

Fix this by removing the fallthrough which, due to the lack of an explicit
comment, seems to be unintended anyway.

Fixes: e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
Fixes: 2d253440b5af ("block: Define zoned block device operations")
Signed-off-by: Nicolai Stange
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Nicolai Stange
2016-12-05 22:54:39 +0800

04 Dec, 2016

1 commit

e88f72cb9 nbd: fix 64-bit division ... Browse Code »

We have this:

ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined!
ERROR: "__divdi3" [drivers/block/nbd.ko] undefined!
nbd.c:(.text+0x247c72): undefined reference to `__divdi3'

due to a recent commit, that did 64-bit division. Use the proper
divider function so that 32-bit compiles don't break.

Fixes: ef77b515243b ("nbd: use loff_t for blocksize and nbd_set_size args")
Signed-off-by: Jens Axboe

Jens Axboe
2016-12-04 03:08:03 +0800

03 Dec, 2016

2 commits

ef77b5152 nbd: use loff_t for blocksize and nbd_set_size args ... Browse Code »

If we have large devices (say like the 40t drive I was trying to test with) we
will end up overflowing the int arguments to nbd_set_size and not get the right
size for our device. Fix this by using loff_t everywhere so I don't have to
think about this again. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Jens Axboe

Josef Bacik
2016-12-03 12:06:29 +0800
209200efa blk-stat: fix a typo ... Browse Code »

Signed-off-by: Shaohua Li
Fixes: cf43e6be865a ("block: add scalable completion tracking of requests")
Signed-off-by: Jens Axboe

Shaohua Li
2016-12-03 11:17:43 +0800

01 Dec, 2016

9 commits

e0c723000 block: factor out req_set_nomerge ... Browse Code »

Factor out common code for setting REQ_NOMERGE flag which is being used
out at certain places and make it a helper instead, req_set_nomerge().

Signed-off-by: Ritesh Harjani

Get rid of the inline.

Signed-off-by: Jens Axboe

Ritesh Harjani
2016-12-01 23:36:16 +0800
af309226d block: protect iterate_bdevs() against concurrent close ... Browse Code »

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
IP: [] blk_get_backing_dev_info+0x10/0x20
PGD 9e62067 PUD 9ee8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
RIP: 0010:[] [] blk_get_backing_dev_info+0x10/0x20
RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
Stack:
ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
Call Trace:
[] __filemap_fdatawrite_range+0x85/0x90
[] filemap_fdatawrite+0x1f/0x30
[] fdatawrite_one_bdev+0x16/0x20
[] iterate_bdevs+0xf2/0x130
[] sys_sync+0x63/0x90
[] entry_SYSCALL_64_fastpath+0x12/0x76
Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 8b 80 08 05 00 00 5d
RIP [] blk_get_backing_dev_info+0x10/0x20
RSP
CR2: 0000000000000508
---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Cc: stable@vger.kernel.org
Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
Reported-and-tested-by: Wei Fang
Signed-off-by: Rabin Vincent
Signed-off-by: Jan Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Rabin Vincent
2016-12-01 23:26:39 +0800
5b0e34e19 block: mtip32xx: set error code on failure ... Browse Code »

Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function
mtip_block_initialize(), variable rv takes the return value, and its
value should be negative on errors. rv is initialized as 0 and is not
reset when the call to ida_pre_get() fails. So 0 may be returned.
The return value 0 indicates that there is no error, which may be
inconsistent with the execution status. This patch fixes the bug by
explicitly assigning -ENOMEM to rv on the branch that ida_pre_get()
fails.

Signed-off-by: Pan Bian
Signed-off-by: Jens Axboe

Pan Bian
2016-12-01 23:01:14 +0800
d26292099 nvmet: add support for the Write Zeroes command ... Browse Code »

Add support for handling write zeroes command on target.
Call into __blkdev_issue_zeroout, which the block layer expands into the
best suitable variant of zeroing the LBAs. Allow write zeroes operation
to deallocate the LBAs when calling __blkdev_issue_zeroout.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2016-12-01 22:58:40 +0800
6d31e3ba2 nvme: add support for the Write Zeroes command ... Browse Code »

Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block
device, if the device supports optional command bit set for write
zeroes. Add support to setup write zeroes command. Set maximum possible
write zeroes sectors in one write zeroes command according to
nvme write zeroes command definition.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2016-12-01 22:58:40 +0800
3b7c33b28 nvme.h: add Write Zeroes definitions ... Browse Code »

Add the command structure, optional command set support (ONCS) bit and
a new error code for the Write Zeroes command.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2016-12-01 22:58:40 +0800
a6f0788ec block: add support for REQ_OP_WRITE_ZEROES ... Browse Code »

This adds a new block layer operation to zero out a range of
LBAs. This allows to implement zeroing for devices that don't use
either discard with a predictable zero pattern or WRITE SAME of zeroes.
The prominent example of that is NVMe with the Write Zeroes command,
but in the future, this should also help with improving the way
zeroing discards work. For this operation, suitable entry is exported in
sysfs which indicate the number of maximum bytes allowed in one
write zeroes operation by the device.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2016-12-01 22:58:40 +0800
e73c23ff7 block: add async variant of blkdev_issue_zeroout ... Browse Code »

Similar to __blkdev_issue_discard this variant allows submitting
the final bio asynchronously and chaining multiple ranges
into a single completion.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2016-12-01 22:58:40 +0800
b02d8aaea block: Check partition alignment on zoned block devices ... Browse Code »

Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of
a zoned block device. However, the first and last zones reported for a
partition make sense only if the partition start sector and size are aligned
on the device zone size. The same applies for zone reset. Resetting the first
or the last zone of a partition straddling zones may impact neighboring
partitions. Finally, if a partition start sector is not at the beginning of a
sequential zone, it will be impossible to write to the first sectors of the
partition on a host-managed device.
Avoid all these problems and incoherencies by ignoring partitions that are not
zone aligned.

Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the
correct disk zoning type (host-aware, host-managed or none) but
bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone
size is unknown). So test this as a way to ensure that a zoned block device is
being handled as such. As a result, for a host-aware devices, unaligned zone
partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the
disk will be treated as a regular block device (as it should). If zoned block
device support is enabled, only aligned partitions will be accepted.

Signed-off-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Damien Le Moal
2016-12-01 22:56:53 +0800

30 Nov, 2016

2 commits

333ba053d lightnvm: transform target get/set bad block ... Browse Code »

Since targets are given a virtual target device, it is necessary to
translate all communication between targets and the backend device.
Implement the translation layer for get/set bad block table.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2016-11-30 03:12:51 +0800
da2d7cb82 lightnvm: use target nvm on target-specific ops. ... Browse Code »

On target-specific operations pass on nvm_tgt_dev instead of the generic
nvm device.

Signed-off-by: Javier González
Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Javier González
2016-11-30 03:12:51 +0800