Eric Lee / smarc-fsl-linux-kernel

05 Apr, 2018

1 commit

06dd3dfee Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc ... Browse Code »

Pull char/misc updates from Greg KH:
"Here is the big set of char/misc driver patches for 4.17-rc1.

There are a lot of little things in here, nothing huge, but all
important to the different hardware types involved:

- thunderbolt driver updates

- parport updates (people still care...)

- nvmem driver updates

- mei updates (as always)

- hwtracing driver updates

- hyperv driver updates

- extcon driver updates

- ... and a handful of even smaller driver subsystem and individual
driver updates

All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
hwtracing: Add HW tracing support menu
intel_th: Add ACPI glue layer
intel_th: Allow forcing host mode through drvdata
intel_th: Pick up irq number from resources
intel_th: Don't touch switch routing in host mode
intel_th: Use correct method of finding hub
intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
stm class: Make dummy's master/channel ranges configurable
stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
hv: add SPDX license id to Kconfig
hv: add SPDX license to trace
Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
/dev/mem: Avoid overwriting "err" in read_mem()
eeprom: at24: use SPDX identifier instead of GPL boiler-plate
eeprom: at24: simplify the i2c functionality checking
eeprom: at24: fix a line break
eeprom: at24: tweak newlines
eeprom: at24: refactor at24_probe()
...

Linus Torvalds
2018-04-05 11:07:20 +0800

26 Mar, 2018

1 commit

a687a5337 treewide: simplify Kconfig dependencies for removed archs ... Browse Code »

A lot of Kconfig symbols have architecture specific dependencies.
In those cases that depend on architectures we have already removed,
they can be omitted.

Acked-by: Kalle Valo
Acked-by: Alexandre Belloni
Signed-off-by: Arnd Bergmann

Arnd Bergmann
2018-03-26 21:55:57 +0800

16 Mar, 2018

1 commit

f33ff110e block, char_dev: Use correct format specifier for unsigned ints ... Browse Code »

register_blkdev() and __register_chrdev_region() treat the major
number as an unsigned int. So print it the same way to avoid
absurd error statements such as:
"... major requested (-1) is greater than the maximum (511) ..."
(and also fix off-by-one bugs in the error prints).

While at it, also update the comment describing register_blkdev().

Signed-off-by: Srivatsa S. Bhat
Reviewed-by: Logan Gunthorpe
Signed-off-by: Greg Kroah-Hartman

Srivatsa S. Bhat
2018-03-16 00:59:24 +0800

03 Mar, 2018

1 commit

fb6d47a59 Merge tag 'for-linus-20180302' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A collection of fixes for this series. This is a little larger than
usual at this time, but that's mainly because I was out on vacation
last week. Nothing in here is major in any way, it's just two weeks of
fixes. This contains:

- NVMe pull from Keith, with a set of fixes from the usual suspects.

- mq-deadline zone unlock fix from Damien, fixing an issue with the
SMR zone locking added for 4.16.

- two bcache fixes sent in by Michael, with changes from Coly and
Tang.

- comment typo fix from Eric for blktrace.

- return-value error handling fix for nbd, from Gustavo.

- fix a direct-io case where we don't defer to a completion handler,
making us sleep from IRQ device completion. From Jan.

- a small series from Jan fixing up holes around handling of bdev
references.

- small set of regression fixes from Jiufei, mostly fixing problems
around the gendisk pointer -> partition index change.

- regression fix from Ming, fixing a boundary issue with the discard
page cache invalidation.

- two-patch series from Ming, fixing both a core blk-mq-sched and
kyber issue around token freeing on a requeue condition"

* tag 'for-linus-20180302' of git://git.kernel.dk/linux-block: (24 commits)
block: fix a typo
block: display the correct diskname for bio
block: fix the count of PGPGOUT for WRITE_SAME
mq-deadline: Make sure to always unlock zones
nvmet: fix PSDT field check in command format
nvme-multipath: fix sysfs dangerously created links
nbd: fix return value in error handling path
bcache: fix kcrashes with fio in RAID5 backend dev
bcache: correct flash only vols (check all uuids)
blktrace_api.h: fix comment for struct blk_user_trace_setup
blockdev: Avoid two active bdev inodes for one device
genhd: Fix BUG in blkdev_open()
genhd: Fix use after free in __blkdev_get()
genhd: Add helper put_disk_and_module()
genhd: Rename get_disk() to get_disk_and_module()
genhd: Fix leaked module reference for NVME devices
direct-io: Fix sleep in atomic due to sync AIO
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
block: kyber: fix domain token leak during requeue
blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
...

Linus Torvalds
2018-03-03 01:35:36 +0800

01 Mar, 2018

3 commits

9c0fb1e31 block: display the correct diskname for bio ... Browse Code »

bio_devname use __bdevname to display the device name, and can
only show the major and minor of the part0,
Fix this by using disk_name to display the correct name.

Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Signed-off-by: Jiufei Xue
Signed-off-by: Jens Axboe

Jiufei Xue
2018-03-01 23:41:25 +0800
7c5a0dcf5 block: fix the count of PGPGOUT for WRITE_SAME ... Browse Code »

The vm counters is counted in sectors, so we should do the conversation
in submit_bio.

Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org
Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Signed-off-by: Jiufei Xue
Signed-off-by: Jens Axboe

Jiufei Xue
2018-03-01 23:41:23 +0800
f3bc78d2d mq-deadline: Make sure to always unlock zones ... Browse Code »

In case of a failed write request (all retries failed) and when using
libata, the SCSI error handler calls scsi_finish_command(). In the
case of blk-mq this means that scsi_mq_done() does not get called,
that blk_mq_complete_request() does not get called and also that the
mq-deadline .completed_request() method is not called. This results in
the target zone of the failed write request being left in a locked
state, preventing that any new write requests are issued to the same
zone.

Fix this by replacing the .completed_request() method with the
.finish_request() method as this method is always called whether or
not a request completes successfully. Since the .finish_request()
method is only called by the blk-mq core if a .prepare_request()
method exists, add a dummy .prepare_request() method.

Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
Cc: Hannes Reinecke
Reviewed-by: Ming Lei
Signed-off-by: Damien Le Moal
[ bvanassche: edited patch description ]
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Damien Le Moal
2018-03-01 23:39:24 +0800

27 Feb, 2018

4 commits

56c0908c8 genhd: Fix BUG in blkdev_open() ... Browse Code »

When two blkdev_open() calls for a partition race with device removal
and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
blkdev_open(). The race can happen as follows:

CPU0 CPU1 CPU2
del_gendisk()
bdev_unhash_inode(part1);

blkdev_open(part1, O_EXCL) blkdev_open(part1, O_EXCL)
bdev = bd_acquire() bdev = bd_acquire()
blkdev_get(bdev)
bd_start_claiming(bdev)
- finds old inode 'whole'
bd_prepare_to_claim() -> 0
bdev_unhash_inode(whole);

blkdev_get(bdev);
bd_start_claiming(bdev)
- finds new inode 'whole'
bd_prepare_to_claim()
- this also succeeds as we have
different 'whole' here...
- bad things happen now as we
have two exclusive openers of
the same bdev

The problem here is that block device opens can see various intermediate
states while gendisk is shutting down and then being recreated.

We fix the problem by introducing new lookup_sem in gendisk that
synchronizes gendisk deletion with get_gendisk() and furthermore by
making sure that get_gendisk() does not return gendisk that is being (or
has been) deleted. This makes sure that once we ever manage to look up
newly created bdev inode, we are also guaranteed that following
get_gendisk() will either return failure (and we fail open) or it
returns gendisk for the new device and following bdget_disk() will
return new bdev inode (i.e., blkdev_open() follows the path as if it is
completely run after new device is created).

Reported-and-analyzed-by: Hou Tao
Tested-by: Hou Tao
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800
9df6c2991 genhd: Add helper put_disk_and_module() ... Browse Code »

Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.

Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800
3079c22ea genhd: Rename get_disk() to get_disk_and_module() ... Browse Code »

Rename get_disk() to get_disk_and_module() to make sure what the
function does. It's not a great name but at least it is now clear that
put_disk() is not it's counterpart.

Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800
d52987b52 genhd: Fix leaked module reference for NVME devices ... Browse Code »

Commit 8ddcd653257c "block: introduce GENHD_FL_HIDDEN" added handling of
hidden devices to get_gendisk() but forgot to drop module reference
which is also acquired by get_disk(). Drop the reference as necessary.

Arguably the function naming here is misleading as put_disk() is *not*
the counterpart of get_disk() but let's fix that in the follow up
commit since that will be more intrusive.

Fixes: 8ddcd653257c18a669fcb75ee42c37054908e0d6
CC: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800

25 Feb, 2018

2 commits

ba989a014 block: kyber: fix domain token leak during requeue ... Browse Code »

When requeuing request, the domain token should have been freed
before re-inserting the request to io scheduler. Otherwise, the
assigned domain token will be leaked, and IO hang can be caused.

Cc: Paolo Valente
Cc: Omar Sandoval
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-02-25 06:55:54 +0800
105976f51 blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch ... Browse Code »

__blk_mq_requeue_request() covers two cases:

- one is that the requeued request is added to hctx->dispatch, such as
blk_mq_dispatch_rq_list()

- another case is that the request is requeued to io scheduler, such as
blk_mq_requeue_request().

We should call io sched's .requeue_request callback only for the 2nd
case.

Cc: Paolo Valente
Cc: Omar Sandoval
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche
Acked-by: Paolo Valente
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-02-25 06:55:54 +0800

24 Feb, 2018

1 commit

0bd1ed486 block: pass inclusive 'lend' parameter to truncate_inode_pages_range ... Browse Code »

The 'lend' parameter of truncate_inode_pages_range is required to be
inclusive, so follow the rule.

This patch fixes one memory corruption triggered by discard.

Cc:
Cc: Dmitry Monakhov
Fixes: 351499a172c0 ("block: Invalidate cache on discard v2")
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-02-24 06:20:19 +0800

22 Feb, 2018

1 commit

ed7158bae treewide/trivial: Remove ';;$' typo noise ... Browse Code »

On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:

--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
struct efi_uga_draw_protocol *uga = NULL, *first_uga;
efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;;
+ u32 *handles = (u32 *)uga_handle;
efi_status_t status = EFI_INVALID_PARAMETER;
int i;

This patch is the result of the following script:

$ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)

... followed by manual review to make sure it's all good.

Splitting this up is just crazy talk, let's get over with this and just do it.

Reported-by: Pavel Machek
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2018-02-22 17:59:33 +0800

14 Feb, 2018

1 commit

67b4110f8 blk: optimization for classic polling ... Browse Code »

This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.

Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html

Changes since v2->v3:
-using __set_current_state() instead of set_current_state()

Changes since v1->v2:
-setting task state once in blk_poll, instead of multiple
callers.

Signed-off-by: Nitesh Shetty
Signed-off-by: Jens Axboe

Nitesh Shetty
2018-02-14 00:12:04 +0800

12 Feb, 2018

1 commit

a9a08845e vfs: do bulk POLL* -> EPOLL* replacement ... Browse Code »

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^$[^\"]*$$\$/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2018-02-12 06:34:03 +0800

08 Feb, 2018

1 commit

a78773906 block, bfq: add requeue-request hook ... Browse Code »

Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, without a finish_request invoked on that request
before each re-insertion.

This fact is the cause of the failure reported in [1]. For an I/O
scheduler, every re-insertion of the same re-prepared request is
equivalent to the insertion of a new request. For schedulers like
mq-deadline or kyber, this fact causes no harm. In contrast, it
confuses a stateful scheduler like BFQ, which keeps state for an I/O
request, until the finish_request hook is invoked on the request. In
particular, BFQ may get stuck, waiting forever for the number of
request dispatches, of the same request, to be balanced by an equal
number of request completions (while there will be one completion for
that request). In this state, BFQ may refuse to serve I/O requests
from other bfq_queues. The hang reported in [1] then follows.

However, the above re-prepared requests undergo a requeue, thus the
requeue_request hook of the active elevator is invoked for these
requests, if set. This commit then addresses the above issue by
properly implementing the hook requeue_request in BFQ.

[1] https://marc.info/?l=linux-block&m=151211117608676

Reported-by: Ivan Kozik
Reported-by: Alban Browaeys
Tested-by: Mike Galbraith
Signed-off-by: Paolo Valente
Signed-off-by: Serena Ziviani
Signed-off-by: Jens Axboe

Paolo Valente
2018-02-08 06:17:46 +0800

07 Feb, 2018

2 commits

30abb3a67 block: Add should_fail_bio() for bpf error injection ... Browse Code »

The classic error injection mechanism, should_fail_request() does not
support use cases where more information is required (from the entire
struct bio, for example).

To that end, this patch introduces should_fail_bio(), which calls
should_fail_request() under the hood but provides a convenient
place for kprobes to hook into if they require the entire struct bio.
This patch also replaces some existing calls to should_fail_request()
with should_fail_bio() with no degradation in performance.

Signed-off-by: Howard McLauchlan
Signed-off-by: Jens Axboe

Howard McLauchlan
2018-02-07 06:09:51 +0800
5235553d8 blk-wbt: account flush requests correctly ... Browse Code »

Mikulas reported a workload that saw bad performance, and figured
out what it was due to various other types of requests being
accounted as reads. Flush requests, for instance. Due to the
high latency of those, we heavily throttle the writes to keep
the latencies in balance. But they really should be accounted
as writes.

Fix this by checking the exact type of the request. If it's a
read, account as a read, if it's a write or a flush, account
as a write. Any other request we disregard. Previously everything
would have been mistakenly accounted as reads.

Reported-by: Mikulas Patocka
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Jens Axboe

Jens Axboe
2018-02-07 05:14:03 +0800

05 Feb, 2018

1 commit

64b28683d Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:
"Most of this is fixes and not new code/features:

- skd fix from Arnd, fixing a build error dependent on sla allocator
type.

- blk-mq scheduler discard merging fixes, one from me and one from
Keith. This fixes a segment miscalculation for blk-mq-sched, where
we mistakenly think two segments are physically contigious even
though the request isn't carrying real data. Also fixes a bio-to-rq
merge case.

- Don't re-set a bit on the buffer_head flags, if it's already set.
This can cause scalability concerns on bigger machines and
workloads. From Kemi Wang.

- Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
distuingish between a local (device related) resource starvation
and a global one. The latter might happen without IO being in
flight, so it has to be handled a bit differently. From Ming"

* tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
block: skd: fix incorrect linux/slab_def.h inclusion
buffer: Avoid setting buffer bits that are already set
blk-mq-sched: Enable merging discard bio into request
blk-mq: fix discard merge with scheduler attached
blk-mq: introduce BLK_STS_DEV_RESOURCE

Linus Torvalds
2018-02-05 03:16:35 +0800

02 Feb, 2018

2 commits

bea99a500 blk-mq-sched: Enable merging discard bio into request ... Browse Code »

Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2018-02-02 05:45:11 +0800
445251d0f blk-mq: fix discard merge with scheduler attached ... Browse Code »

I ran into an issue on my laptop that triggered a bug on the
discard path:

WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U 4.15.0+ #176
Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
RIP: 0010:nvme_setup_cmd+0x3d3/0x430
RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
FS: 0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
nvme_queue_rq+0x40/0xa00
? __sbitmap_queue_get+0x24/0x90
? blk_mq_get_tag+0xa3/0x250
? wait_woken+0x80/0x80
? blk_mq_get_driver_tag+0x97/0xf0
blk_mq_dispatch_rq_list+0x7b/0x4a0
? deadline_remove_request+0x49/0xb0
blk_mq_do_dispatch_sched+0x4f/0xc0
blk_mq_sched_dispatch_requests+0x106/0x170
__blk_mq_run_hw_queue+0x53/0xa0
__blk_mq_delay_run_hw_queue+0x83/0xa0
blk_mq_run_hw_queue+0x6c/0xd0
blk_mq_sched_insert_request+0x96/0x140
__blk_mq_try_issue_directly+0x3d/0x190
blk_mq_try_issue_directly+0x30/0x70
blk_mq_make_request+0x1a4/0x6a0
generic_make_request+0xfd/0x2f0
? submit_bio+0x5c/0x110
submit_bio+0x5c/0x110
? __blkdev_issue_discard+0x152/0x200
submit_bio_wait+0x43/0x60
ext4_process_freed_data+0x1cd/0x440
? account_page_dirtied+0xe2/0x1a0
ext4_journal_commit_callback+0x4a/0xc0
jbd2_journal_commit_transaction+0x17e2/0x19e0
? kjournald2+0xb0/0x250
kjournald2+0xb0/0x250
? wait_woken+0x80/0x80
? commit_timeout+0x10/0x10
kthread+0x111/0x130
? kthread_create_worker_on_cpu+0x50/0x50
? do_group_exit+0x3a/0xa0
ret_from_fork+0x1f/0x30
Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff
---[ end trace 50d361cc444506c8 ]---
print_req_error: I/O error, dev nvme0n1, sector 847167488

Decoding the assembly, the request claims to have 0xffff segments,
while nvme counts two. This turns out to be because we don't check
for a data carrying request on the mq scheduler path, and since
blk_phys_contig_segment() returns true for a non-data request,
we decrement the initial segment count of 0 and end up with
0xffff in the unsigned short.

There are a few issues here:

1) We should initialize the segment count for a discard to 1.
2) The discard merging is currently using the data limits for
segments and sectors.

Fix this up by having attempt_merge() correctly identify the
request, and by initializing the segment count correctly
for discards.

This can only be triggered with mq-deadline on discard capable
devices right now, which isn't a common configuration.

Signed-off-by: Jens Axboe

Jens Axboe
2018-02-02 05:01:02 +0800

31 Jan, 2018

2 commits

86ff7c2a8 blk-mq: introduce BLK_STS_DEV_RESOURCE ... Browse Code »

This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe
Tested-by: Laurence Oberman
Signed-off-by: Ming Lei
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Ming Lei
2018-01-31 11:18:28 +0800
168fe32a0 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.

Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.

Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.

The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.

As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...

Linus Torvalds
2018-01-31 09:58:07 +0800

30 Jan, 2018

1 commit

0a4b6e2f8 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"This is the main pull request for block IO related changes for the
4.16 kernel. Nothing major in this pull request, but a good amount of
improvements and fixes all over the map. This contains:

- BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
Paolo.

- Support for SMR zones for deadline and mq-deadline from Damien and
Christoph.

- Set of fixes for bcache by way of Michael Lyle, including fixes
from himself, Kent, Rui, Tang, and Coly.

- Series from Matias for lightnvm with fixes from Hans Holmberg,
Javier, and Matias. Mostly centered around pblk, and the removing
rrpc 1.2 in preparation for supporting 2.0.

- A couple of NVMe pull requests from Christoph. Nothing major in
here, just fixes and cleanups, and support for command tracing from
Johannes.

- Support for blk-throttle for tracking reads and writes separately.
From Joseph Qi. A few cleanups/fixes also for blk-throttle from
Weiping.

- Series from Mike Snitzer that enables dm to register its queue more
logically, something that's alwways been problematic on dm since
it's a stacked device.

- Series from Ming cleaning up some of the bio accessor use, in
preparation for supporting multipage bvecs.

- Various fixes from Ming closing up holes around queue mapping and
quiescing.

- BSD partition fix from Richard Narron, fixing a problem where we
can't mount newer (10/11) FreeBSD partitions.

- Series from Tejun reworking blk-mq timeout handling. The previous
scheme relied on atomic bits, but it had races where we would think
a request had timed out if it to reused at the wrong time.

- null_blk now supports faking timeouts, to enable us to better
exercise and test that functionality separately. From me.

- Kill the separate atomic poll bit in the request struct. After
this, we don't use the atomic bits on blk-mq anymore at all. From
me.

- sgl_alloc/free helpers from Bart.

- Heavily contended tag case scalability improvement from me.

- Various little fixes and cleanups from Arnd, Bart, Corentin,
Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
block: remove smart1,2.h
nvme: add tracepoint for nvme_complete_rq
nvme: add tracepoint for nvme_setup_cmd
nvme-pci: introduce RECONNECTING state to mark initializing procedure
nvme-rdma: remove redundant boolean for inline_data
nvme: don't free uuid pointer before printing it
nvme-pci: Suspend queues after deleting them
bsg: use pr_debug instead of hand crafted macros
blk-mq-debugfs: don't allow write on attributes with seq_operations set
nvme-pci: Fix queue double allocations
block: Set BIO_TRACE_COMPLETION on new bio during split
blk-throttle: use queue_is_rq_based
block: Remove kblockd_schedule_delayed_work{,_on}()
blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
lib/scatterlist: Fix chaining support in sgl_alloc_order()
blk-throttle: track read and write request individually
block: add bdev_read_only() checks to common helpers
block: fail op_is_write() requests to read-only partitions
blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
...

Linus Torvalds
2018-01-30 03:51:49 +0800

25 Jan, 2018

2 commits

3124b65da bsg: use pr_debug instead of hand crafted macros ... Browse Code »

Use pr_debug instead of hand crafted macros. This way it is not needed to
re-compile the kernel to enable bsg debug outputs and it's possible to
selectively enable specific prints.

Cc: Joe Perches
Reviewed-by: Bart Van Assche
Signed-off-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Johannes Thumshirn
2018-01-25 00:50:06 +0800
6b136a24b blk-mq-debugfs: don't allow write on attributes with seq_operations set ... Browse Code »

Attributes that only implement .seq_ops are read-only, any write to
them should be rejected. But currently kernel would crash when
writing to such debugfs entries, e.g.

chmod +w /sys/kernel/debug/block//requeue_list
echo 0 > /sys/kernel/debug/block//requeue_list
chmod -w /sys/kernel/debug/block//requeue_list

Fix it by returning -EPERM in blk_mq_debugfs_write() when writing to
such attributes.

Cc: Ming Lei
Signed-off-by: Eryu Guan
Signed-off-by: Jens Axboe

Eryu Guan
2018-01-25 00:46:09 +0800

24 Jan, 2018

1 commit

20d59023c block: Set BIO_TRACE_COMPLETION on new bio during split ... Browse Code »

We inadvertently set it again on the source bio, but we need
to set it on the new split bio instead.

Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jens Axboe

Goldwyn Rodrigues
2018-01-24 00:10:19 +0800

20 Jan, 2018

4 commits

475a055e6 blk-throttle: use queue_is_rq_based ... Browse Code »

use queue_is_rq_based instead of open code.

Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2018-01-20 12:10:20 +0800
f5ced52aa block: Remove kblockd_schedule_delayed_work{,_on}() ... Browse Code »

The previous patch removed all users of these two functions. Hence
also remove the functions themselves.

Reviewed-by: Mike Snitzer
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-20 03:52:03 +0800
ae943d206 blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays ... Browse Code »

Make sure that calling blk_mq_run_hw_queue() or
blk_mq_kick_requeue_list() triggers a queue run without delay even
if blk_mq_delay_run_hw_queue() has been called recently and if its
delay has not yet expired.

Reviewed-by: Mike Snitzer
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-20 03:52:01 +0800
c77ff7fd0 blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() ... Browse Code »

Most blk-mq functions have a name that follows the pattern blk_mq_${action}.
However, the function name blk_mq_request_direct_issue is an exception.
Hence rename this function. This patch does not change any functionality.

Reviewed-by: Mike Snitzer
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-20 03:51:59 +0800

19 Jan, 2018

7 commits

b889bf66d blk-throttle: track read and write request individually ... Browse Code »

In mixed read/write workload on SSD, write latency is much lower than
read. But now we only track and record read latency and then use it as
threshold base for both read and write io latency accounting. As a
result, write io latency will always be considered as good and
bad_bio_cnt is much smaller than 20% of bio_cnt. That is to mean, the
tg to be checked will be treated as idle most of the time and still let
others dispatch more ios, even it is truly running under low limit and
wants its low limit to be guaranteed, which is not we expected in fact.
So track read and write request individually, which can bring more
precise latency control for low limit idle detection.

Signed-off-by: Joseph Qi
Reviewed-by: Shaohua Li
Signed-off-by: Jens Axboe

Joseph Qi
2018-01-19 10:51:08 +0800
a13553c77 block: add bdev_read_only() checks to common helpers ... Browse Code »

Similar to blkdev_write_iter(), return -EPERM if the partition is
read-only. This covers ioctl(), fallocate() and most in-kernel users
but isn't meant to be exhaustive -- everything else will be caught in
generic_make_request_checks(), fail with -EIO and can be fixed later.

Reviewed-by: Sagi Grimberg
Signed-off-by: Ilya Dryomov
Signed-off-by: Jens Axboe

Ilya Dryomov
2018-01-19 03:57:19 +0800
721c7fc70 block: fail op_is_write() requests to read-only partitions ... Browse Code »

Regular block device writes go through blkdev_write_iter(), which does
bdev_read_only(), while zeroout/discard/etc requests are never checked,
both userspace- and kernel-triggered. Add a generic catch-all check to
generic_make_request_checks() to actually enforce ioctl(BLKROSET) and
set_disk_ro(), which is used by quite a few drivers for things like
snapshots, read-only backing files/images, etc.

Reviewed-by: Sagi Grimberg
Signed-off-by: Ilya Dryomov
Signed-off-by: Jens Axboe

Ilya Dryomov
2018-01-19 03:57:17 +0800
17534c6f2 blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ... Browse Code »

export these two interface for cgroup-v1.

Acked-by: Tejun Heo
Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2018-01-19 03:55:55 +0800
2c2086afc block: Protect less code with sysfs_lock in blk_{un,}register_queue() ... Browse Code »

The __blk_mq_register_dev(), blk_mq_unregister_dev(),
elv_register_queue() and elv_unregister_queue() calls need to be
protected with sysfs_lock but other code in these functions not.
Hence protect only this code with sysfs_lock. This patch fixes a
locking inversion issue in blk_unregister_queue() and also in an
error path of blk_register_queue(): it is not allowed to hold
sysfs_lock around the kobject_del(&q->kobj) call.

Reviewed-by: Christoph Hellwig
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-19 03:54:44 +0800
14a23498b block: Document scheduler modification locking requirements ... Browse Code »

This patch does not change any functionality.

Reviewed-by: Christoph Hellwig
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-19 03:54:42 +0800
83d016ac8 block: Unexport elv_register_queue() and elv_unregister_queue() ... Browse Code »

These two functions are only called from inside the block layer so
unexport them.

Reviewed-by: Christoph Hellwig
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-19 03:54:41 +0800