Eric Lee / smarc-fsl-linux-kernel

26 Jan, 2020

1 commit

ccbc5d03c block: fix memleak of bio integrity data ... Browse Code »

[ Upstream commit ece841abbed2da71fa10710c687c9ce9efb6bf69 ]

7c20f11680a4 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.

So memory leak of bio integrity data is caused by commit 7c20f11680a4.

Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

Fixes: 7c20f11680a4 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: Christoph Hellwig
Signed-off-by Justin Tee

Add commit log, and simplify/fix the original patch wroten by Justin.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Justin Tee
2020-01-26 17:01:09 +0800

12 Jan, 2020

1 commit

f7cc2f988 block: Fix a lockdep complaint triggered by request queue flushing ... Browse Code »

[ Upstream commit b3c6a59975415bde29cfd76ff1ab008edbf614a9 ]

Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:

============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&fq->mq_flush_lock)->rlock);
lock(&(&fq->mq_flush_lock)->rlock);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by ksoftirqd/1/16:
#0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.45+0x2b4/0x313
lock_acquire+0x98/0x160
_raw_spin_lock_irqsave+0x3b/0x80
flush_end_io+0x4e/0x1d0
blk_mq_complete_request+0x76/0x110
nvmet_req_complete+0x15/0x110 [nvmet]
nvmet_bio_done+0x27/0x50 [nvmet]
blk_update_request+0xd7/0x2d0
blk_mq_end_request+0x1a/0x100
blk_flush_complete_seq+0xe5/0x350
flush_end_io+0x12f/0x1d0
blk_done_softirq+0x9f/0xd0
__do_softirq+0xca/0x440
run_ksoftirqd+0x24/0x50
smpboot_thread_fn+0x113/0x1e0
kthread+0x121/0x140
ret_from_fork+0x3a/0x50

Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Bart Van Assche
2020-01-12 19:21:42 +0800

27 Sep, 2019

1 commit

8d6996630 block: fix null pointer dereference in blk_mq_rq_timed_out() ... Browse Code »

We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:

[ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 108.827059] PGD 0 P4D 0
[ 108.827313] Oops: 0000 [#1] SMP PTI
[ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[ 108.829503] Workqueue: kblockd blk_mq_timeout_work
[ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[ 108.838191] Call Trace:
[ 108.838406] bt_iter+0x74/0x80
[ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
[ 108.839074] ? __switch_to_asm+0x34/0x70
[ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
[ 108.840732] blk_mq_timeout_work+0x74/0x200
[ 108.841151] process_one_work+0x297/0x680
[ 108.841550] worker_thread+0x29c/0x6f0
[ 108.841926] ? rescuer_thread+0x580/0x580
[ 108.842344] kthread+0x16a/0x1a0
[ 108.842666] ? kthread_flush_work+0x170/0x170
[ 108.843100] ret_from_fork+0x35/0x40

The bug is caused by the race between timeout handle and completion for
flush request.

When timeout handle function blk_mq_rq_timed_out() try to read
'req->q->mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.

After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq->ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.

However, flush request has defined .end_io and rq->end_io() is still
called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.

We fix this problem by covering flush request with 'rq->ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().

Cc: Christoph Hellwig
Cc: Keith Busch
Cc: Bart Van Assche
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei
Reviewed-by: Bob Liu
Signed-off-by: Yufen Yu

-------
v2:
- move rq_status from struct request to struct blk_flush_queue
v3:
- remove unnecessary '{}' pair.
v4:
- let spinlock to protect 'fq->rq_status'
v5:
- move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe

Yufen Yu
2019-09-27 21:01:25 +0800

26 Sep, 2019

1 commit

284b94be1 blk-mq: move lockdep_assert_held() into elevator_exit ... Browse Code »

Commit c48dac137a62 ("block: don't hold q->sysfs_lock in elevator_init_mq")
removes q->sysfs_lock from elevator_init_mq(), but forgot to deal with
lockdep_assert_held() called in blk_mq_sched_free_requests() which is
run in failure path of elevator_init_mq().

blk_mq_sched_free_requests() is called in the following 3 functions:

elevator_init_mq()
elevator_exit()
blk_cleanup_queue()

In blk_cleanup_queue(), blk_mq_sched_free_requests() is followed exactly
by 'mutex_lock(&q->sysfs_lock)'.

So moving the lockdep_assert_held() from blk_mq_sched_free_requests()
into elevator_exit() for fixing the report by syzbot.

Reported-by: syzbot+da3b7677bb913dc1b737@syzkaller.appspotmail.com
Fixed: c48dac137a62 ("block: don't hold q->sysfs_lock in elevator_init_mq")
Reviewed-by: Bart Van Assche
Reviewed-by: Damien Le Moal
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-09-26 14:45:05 +0800

06 Sep, 2019

1 commit

954b4a5ce block: Change elevator_init_mq() to always succeed ... Browse Code »

If the default elevator chosen is mq-deadline, elevator_init_mq() may
return an error if mq-deadline initialization fails, leading to
blk_mq_init_allocated_queue() returning an error, which in turn will
cause the block device initialization to fail and the device not being
exposed.

Instead of taking such extreme measure, handle mq-deadline
initialization failures in the same manner as when mq-deadline is not
available (no module to load), that is, default to the "none" scheduler.
With this change, elevator_init_mq() return type can be changed to void.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2019-09-06 09:52:33 +0800

28 Aug, 2019

1 commit

cecf5d87f block: split .sysfs_lock into two locks ... Browse Code »

The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.

However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].

On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.

So split .sysfs_lock into two: one is still named as .sysfs_lock for
covering sync .store, the other one is named as .sysfs_dir_lock
for covering kobjects and related status change.

sysfs itself can handle the race between add/remove kobjects and
showing/storing attributes under kobjects. For switching scheduler
via storing to 'queue/scheduler', we use the queue flag of
QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
we can avoid to hold .sysfs_lock during removing/adding kobjects.

[1] lockdep warning
======================================================
WARNING: possible circular locking dependency detected
5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
------------------------------------------------------
rmmod/777 is trying to acquire lock:
00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72

but task is already holding lock:
00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&q->sysfs_lock){+.+.}:
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
__mutex_lock+0x14a/0xa9b
blk_mq_hw_sysfs_show+0x63/0xb6
sysfs_kf_seq_show+0x11f/0x196
seq_read+0x2cd/0x5f2
vfs_read+0xc7/0x18c
ksys_read+0xc4/0x13e
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (kn->count#202){++++}:
check_prev_add+0x5d2/0xc45
validate_chain+0xed3/0xf94
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
__kernfs_remove+0x237/0x40b
kernfs_remove_by_name_ns+0x59/0x72
remove_files+0x61/0x96
sysfs_remove_group+0x81/0xa4
sysfs_remove_groups+0x3b/0x44
kobject_del+0x44/0x94
blk_mq_unregister_dev+0x83/0xdd
blk_unregister_queue+0xa0/0x10b
del_gendisk+0x259/0x3fa
null_del_dev+0x8b/0x1c3 [null_blk]
null_exit+0x5c/0x95 [null_blk]
__se_sys_delete_module+0x204/0x337
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&q->sysfs_lock);
lock(kn->count#202);
lock(&q->sysfs_lock);
lock(kn->count#202);

*** DEADLOCK ***

2 locks held by rmmod/777:
#0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
#1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

stack backtrace:
CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
Call Trace:
dump_stack+0x9a/0xe6
check_noncircular+0x207/0x251
? print_circular_bug+0x32a/0x32a
? find_usage_backwards+0x84/0xb0
check_prev_add+0x5d2/0xc45
validate_chain+0xed3/0xf94
? check_prev_add+0xc45/0xc45
? mark_lock+0x11b/0x804
? check_usage_forwards+0x1ca/0x1ca
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
? kernfs_remove_by_name_ns+0x59/0x72
__kernfs_remove+0x237/0x40b
? kernfs_remove_by_name_ns+0x59/0x72
? kernfs_next_descendant_post+0x7d/0x7d
? strlen+0x10/0x23
? strcmp+0x22/0x44
kernfs_remove_by_name_ns+0x59/0x72
remove_files+0x61/0x96
sysfs_remove_group+0x81/0xa4
sysfs_remove_groups+0x3b/0x44
kobject_del+0x44/0x94
blk_mq_unregister_dev+0x83/0xdd
blk_unregister_queue+0xa0/0x10b
del_gendisk+0x259/0x3fa
? disk_events_poll_msecs_store+0x12b/0x12b
? check_flags+0x1ea/0x204
? mark_held_locks+0x1f/0x7a
null_del_dev+0x8b/0x1c3 [null_blk]
null_exit+0x5c/0x95 [null_blk]
__se_sys_delete_module+0x204/0x337
? free_module+0x39f/0x39f
? blkcg_maybe_throttle_current+0x8a/0x718
? rwlock_bug+0x62/0x62
? __blkcg_punt_bio_submit+0xd0/0xd0
? trace_hardirqs_on_thunk+0x1a/0x20
? mark_held_locks+0x1f/0x7a
? do_syscall_64+0x4c/0x295
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fb696cdbe6b
Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0

Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Greg KH
Cc: Mike Snitzer
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-08-28 00:40:20 +0800

21 Jun, 2019

3 commits

1aa0a133f block: mark blk_rq_bio_prep as inline ... Browse Code »

This function just has a few trivial assignments, has two callers with
one of them being in the fastpath.

Reviewed-by: Hannes Reinecke
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800
e9cd19c0c block: simplify blk_recalc_rq_segments ... Browse Code »

Return the segement and let the callers assign them, which makes the code
a littler more obvious. Also pass the request instead of q plus bio
chain, allowing for the use of rq_for_each_bvec.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800
14ccb66b3 block: remove the bi_phys_segments field in struct bio ... Browse Code »

We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800

07 Jun, 2019

1 commit

c3e221921 block: free sched's request pool in blk_cleanup_queue ... Browse Code »

In theory, IO scheduler belongs to request queue, and the request pool
of sched tags belongs to the request queue too.

However, the current tags allocation interfaces are re-used for both
driver tags and sched tags, and driver tags is definitely host wide,
and doesn't belong to any request queue, same with its request pool.
So we need tagset instance for freeing request of sched tags.

Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
tags to be freed before calling blk_mq_free_tag_set().

Commit 47cdee29ef9d94e ("block: move blk_exit_queue into __blk_release_queue")
moves blk_exit_queue into __blk_release_queue for simplying the fast
path in generic_make_request(), then causes oops during freeing requests
of sched tags in __blk_release_queue().

Fix the above issue by move freeing request pool of sched tags into
blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
in-queue requests at that time. Freeing sched tags has to be kept in queue's
release handler becasue there might be un-completed dispatch activity
which might refer to sched tags.

Cc: Bart Van Assche
Cc: Christoph Hellwig
Fixes: 47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_exit_queue into __blk_release_queue")
Tested-by: Yi Zhang
Reported-by: kernel test robot
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-06-07 12:39:39 +0800

29 May, 2019

1 commit

47cdee29e block: move blk_exit_queue into __blk_release_queue ... Browse Code »

Commit 498f6650aec8 ("block: Fix a race between the cgroup code and
request queue initialization") moves what blk_exit_queue does into
blk_cleanup_queue() for fixing issue caused by changing back
queue lock.

However, after legacy request IO path is killed, driver queue lock
won't be used at all, and there isn't story for changing back
queue lock. Then the issue addressed by Commit 498f6650aec8 doesn't
exist any more.

So move move blk_exit_queue into __blk_release_queue.

This patch basically reverts the following two commits:

498f6650aec8 block: Fix a race between the cgroup code and request queue initialization
24ecc3585348 block: Ensure that a request queue is dissociated from the cgroup controller

Cc: Bart Van Assche
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-05-29 20:09:09 +0800

02 Apr, 2019

1 commit

0383ad437 block: pass page to xen_biovec_phys_mergeable ... Browse Code »

xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
for checking if the two bvecs can be merged, so pass page to
xen_biovec_phys_mergeable() directly.

No function change.

Cc: ris Ostrovsky
Cc: Juergen Gross
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval
Cc: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Boris Ostrovsky
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-04-02 02:11:13 +0800

01 Feb, 2019

1 commit

8ccdf4a37 blk-mq: save queue mapping result into ctx directly ... Browse Code »

Currently, the queue mapping result is saved in a two-dimensional
array. In the hot path, to get a hctx, we need do following:

q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]

This isn't very efficient. We could save the queue mapping result into
ctx directly with different hctx type, like,

ctx->hctxs[type]

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2019-02-01 23:33:04 +0800

27 Nov, 2018

1 commit

5f0ed774e block: sum requests in the plug structure ... Browse Code »

This isn't exactly the same as the previous count, as it includes
requests for all devices. But that really doesn't matter, if we have
more than the threshold (16) queued up, flush it. It's not worth it
to have an expensive list loop for this.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-27 01:35:22 +0800

20 Nov, 2018

1 commit

e2b3fa5af block: Remove bio->bi_ioc ... Browse Code »

bio->bi_ioc is never set so always NULL. Remove references to it in
bio_disassociate_task() and in rq_ioc() and delete this field from
struct bio. With this change, rq_ioc() always returns
current->io_context without the need for a bio argument. Further
simplify the code and make it more readable by also removing this
helper, which also allows to simplify blk_mq_sched_assign_ioc() by
removing its bio argument.

Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Adam Manzanares
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-11-20 10:03:44 +0800

19 Nov, 2018

1 commit

a78b03bc7 Merge tag 'v4.20-rc3' into for-4.21/block ... Browse Code »

Merge in -rc3 to resolve a few conflicts, but also to get a few
important fixes that have gone into mainline since the block
4.21 branch was forked off (most notably the SCSI queue issue,
which is both a conflict AND needed fix).

Signed-off-by: Jens Axboe

Jens Axboe
2018-11-19 06:46:03 +0800

16 Nov, 2018

3 commits

373e4af34 block: remove queue_lockdep_assert_held ... Browse Code »

The only remaining user unconditionally drops and reacquires the lock,
which means we really don't need any additional (conditional) annotation.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:21 +0800
57d74df90 block: use atomic bitops for ->queue_flags ... Browse Code »

->queue_flags is generally not set or cleared in the fast path, and also
generally set or cleared one flag at a time. Make use of the normal
atomic bitops for it so that we don't need to take the queue_lock,
which is otherwise mostly unused in the core block layer now.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:19 +0800
079076b34 block: remove deadline __deadline manipulation helpers ... Browse Code »

No users left since the removal of the legacy request interface, we can
remove all the magic bit stealing now and make it a normal field.

But use WRITE_ONCE/READ_ONCE on the new deadline field, given that we
don't seem to have any mechanism to guarantee a new value actually
gets seen by other threads.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:16 +0800

10 Nov, 2018

1 commit

9d037ad70 block: remove req->timeout_list ... Browse Code »

Unused now that the legacy request path is gone.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-10 02:44:10 +0800

09 Nov, 2018

1 commit

1adfc5e41 block: make sure discard bio is aligned with logical block size ... Browse Code »

Obviously the created discard bio has to be aligned with logical block size.

This patch introduces the helper of bio_allowed_max_sectors() for
this purpose.

Cc: stable@vger.kernel.org
Cc: Mike Snitzer
Cc: Christoph Hellwig
Cc: Xiao Ni
Cc: Mariusz Dabrowski
Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()")
Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks")
Reported-by: Rui Salvaterra
Tested-by: Rui Salvaterra
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-11-09 21:23:14 +0800

08 Nov, 2018

7 commits

f9afca4d3 blk-mq: pass in request/bio flags to queue mapping ... Browse Code »

Prep patch for being able to place request based not just on
CPU location, but also on the type of request.

Reviewed-by: Hannes Reinecke
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:44:59 +0800
820efc62f block: kill request slab cache ... Browse Code »

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
db6d99523 block: remove request_list code ... Browse Code »

It's now dead code, nobody uses it.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
4316b79e4 block: kill legacy parts of timeout handling ... Browse Code »

The only user of legacy timing now is BSG, which is invoked
from the mq timeout handler. Kill the legacy code, and rename
the q->rq_timed_out_fn to q->bsg_job_timeout_fn.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
a1ce35fa4 block: remove dead elevator code ... Browse Code »

This removes a bunch of core and elevator related code. On the core
front, we remove anything related to queue running, draining,
initialization, plugging, and congestions. We also kill anything
related to request allocation, merging, retrieval, and completion.

Remove any checking for single queue IO schedulers, as they no
longer exist. This means we can also delete a bunch of code related
to request issue, adding, completion, etc - and all the SQ related
ops and helpers.

Also kill the load_default_modules(), as all that did was provide
for a way to load the default single queue elevator.

Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800
7e992f847 block: remove non mq parts from the flush code ... Browse Code »

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800
df376b2ed block: respect virtual boundary mask in bvecs ... Browse Code »

With drivers that are settting a virtual boundary constrain, we are
seeing a lot of bio splitting and smaller I/Os being submitted to the
driver.

This happens because the bio gap detection code does not account cases
where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will
split the bio unnecessarily.

Cc: Jan Kara
Cc: Bart Van Assche
Cc: Ming Lei
Reviewed-by: Sagi Grimberg
Signed-off-by: Johannes Thumshirn
Acked-by: Keith Busch
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Johannes Thumshirn
2018-11-08 04:04:22 +0800

26 Oct, 2018

2 commits

bf5054569 block: Introduce blk_revalidate_disk_zones() ... Browse Code »

Drivers exposing zoned block devices have to initialize and maintain
correctness (i.e. revalidate) of the device zone bitmaps attached to
the device request queue (seq_zones_bitmap and seq_zones_wlock).

To simplify coding this, introduce a generic helper function
blk_revalidate_disk_zones() suitable for most (and likely all) cases.
This new function always update the seq_zones_bitmap and seq_zones_wlock
bitmaps as well as the queue nr_zones field when called for a disk
using a request based queue. For a disk using a BIO based queue, only
the number of zones is updated since these queues do not have
schedulers and so do not need the zone bitmaps.

With this change, the zone bitmap initialization code in sd_zbc.c can be
replaced with a call to this function in sd_zbc_read_zones(), which is
called from the disk revalidate block operation method.

A call to blk_revalidate_disk_zones() is also added to the null_blk
driver for devices created with the zoned mode enabled.

Finally, to ensure that zoned devices created with dm-linear or
dm-flakey expose the correct number of zones through sysfs, a call to
blk_revalidate_disk_zones() is added to dm_table_set_restrictions().

The zone bitmaps allocated and initialized with
blk_revalidate_disk_zones() are freed automatically from
__blk_release_queue() using the block internal function
blk_queue_free_zone_bitmaps().

Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Mike Snitzer
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-10-26 01:17:40 +0800
a2d6b3a2d block: Improve zone reset execution ... Browse Code »

There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
necessary to reset a range of zones. Similarly to what is done for
discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
executed asynchronously and a synchronous call done only for the last
BIO of the chain.

Modify blkdev_reset_zones() to operate similarly to
blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
avoid code duplication of that function in blk_zoned.c, rename
next_bio() into blk_next_bio() and declare it as a block internal
function in blk.h.

Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-10-26 01:17:40 +0800

14 Oct, 2018

1 commit

5b202853f blk-mq: change gfp flags to GFP_NOIO in blk_mq_realloc_hw_ctxs ... Browse Code »

blk_mq_realloc_hw_ctxs could be invoked during update hw queues.
At the momemt, IO is blocked. Change the gfp flags from GFP_KERNEL
to GFP_NOIO to avoid forever hang during memory allocation in
blk_mq_realloc_hw_ctxs.

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2018-10-14 05:42:01 +0800

26 Sep, 2018

1 commit

c39ae60df block: remove ARCH_BIOVEC_PHYS_MERGEABLE ... Browse Code »

Take the Xen check into the core code instead of delegating it to
the architectures.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-26 22:45:11 +0800

25 Sep, 2018

5 commits

6e768461c block: remove bvec_to_phys ... Browse Code »

We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there. Also remove the pointless unsigned long cast
for the offset in the opencoded instances.

Signed-off-by: Christoph Hellwig
Reviewed-by: Geert Uytterhoeven
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:59 +0800
3dccdae54 block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable ... Browse Code »

These two checks should always be performed together, so merge them into
a single helper.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:57 +0800
6a9f5f240 block: simplify BIOVEC_PHYS_MERGEABLE ... Browse Code »

Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:54 +0800
27ca1d4ed block: move req_gap_back_merge to blk.h ... Browse Code »

No need to expose these helpers outside the block layer.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:53 +0800
43b729bfe block: move integrity_req_gap_{back,front}_merge to blk.h ... Browse Code »

No need to expose these to drivers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:50 +0800

21 Aug, 2018

1 commit

d48ece209 blk-mq: init hctx sched after update ctx and hctx mapping ... Browse Code »

Currently, when update nr_hw_queues, IO scheduler's init_hctx will
be invoked before the mapping between ctx and hctx is adapted
correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
may depend on this mapping and get wrong result and panic finally.
A simply way to fix this is that switch the IO scheduler to 'none'
before update the nr_hw_queues, and then switch it back after
update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
to nobody use them any more.

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2018-08-21 23:02:55 +0800

17 Aug, 2018

1 commit

599d067dd block: change return type to bool ... Browse Code »

Because blk_do_io_stat() only does a judgement about the request
contributes to IO statistics, it better changes return type to bool.

Signed-off-by: Chengguang Xu
Signed-off-by: Jens Axboe

Chengguang Xu
2018-08-17 03:44:17 +0800

09 Aug, 2018

1 commit

4cf6324b1 block: Introduce blk_exit_queue() ... Browse Code »

This patch does not change any functionality.

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Omar Sandoval
Cc: Alexandru Moise
Cc: Joseph Qi
Cc:
Signed-off-by: Jens Axboe

Bart Van Assche
2018-08-09 23:12:59 +0800