08 Oct, 2020
8 commits
-
Re-use throtl_set_slice_end() to remove duplicate code.
Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
The __throtl_de/enqueue_tg() functions are only be called by
throtl_de/enqueue_tg(), thus we can just open code them to
make code more readable.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
The throtl_schedule_next_dispatch() will validate if the service queue
is empty before calling update_min_dispatch_time(), and the
update_min_dispatch_time() will call throtl_rb_first(), which will
validate service queue again.Thus we can move the service queue validation out of the
throtl_rb_first() to remove the redundant validation in the fast path.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
We should move the list operation after validation.
Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
It can not scale up in throtl_adjusted_limit() if we set bps or iops is
1, which will cause IO hang when enable low limit. Thus we should treat
1 as a illegal value to avoid this issue.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
The IO latency tracking is only for LOW limit, so we should add a
validation to avoid redundant latency tracking if the LOW limit
is not valid.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
We only update the tg->last_finish_time when the low limitaion is
enabled, so we can move the tg->last_finish_time validation a little
forward to avoid getting the unnecessary current time stamp if the
the low limitation is not enabled.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
The throtl_downgrade_state() is always used to change to LIMIT_LOW
limitation, thus remove the latter meaningless parameter which
indicates the limitation index.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe
15 Sep, 2020
5 commits
-
Do not need check the bps or iops limitation if bps or iops is unlimited.
Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
The tg_may_dispatch() will call tg_with_in_bps_limit() and
tg_with_in_iops_limit() to check if we can dispatch a bio or
not, which will calculate bps/iops limitation multiple times.
But tg_may_dispatch() is always called under queue lock, which
means the bps/iops limitation will not change in tg_may_dispatch().So we can calculate the bps/iops limitation only once, and pass
them to tg_with_in_bps_limit() and tg_with_in_iops_limit() to
avoid calculating bps/iops limitation repeatedly.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
The 'throtl_grp_quantum' and 'throtl_quantum' are both read-only
variables, thus better to use readable macros instead of static
variables, which can also save some spaces for .bss area.Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
Use readable READ/WRITE macros instead of magic numbers.
Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe -
Fix some comments' typos.
Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe
01 Jul, 2020
1 commit
-
generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
29 Jun, 2020
3 commits
-
bios must have a valid block group by the time they are submitted.
Acked-by: Tejun Heo
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
blkcg_bio_issue_check is a giant inline function that does three entirely
different things. Factor out the blk-cgroup related bio initalization
into a new helper, and the open code the sequence in the only caller,
relying on the fact that all the actual functionality is stubbed out for
non-cgroup builds.Acked-by: Tejun Heo
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
The only thing in blkcg_bio_issue_check that needs to be under
rcu_read_lock is blk_throtl_bio, so move the locking there.Acked-by: Tejun Heo
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
30 May, 2020
2 commits
-
After blk_throtl_drain is removed, there is no caller of tg_drain_bios,
so remove it as well.Signed-off-by: Guoqing Jiang
Signed-off-by: Jens Axboe -
After the commit 5addeae1bedc4 ("blk-cgroup: remove blkcg_drain_queue"),
there is no caller of blk_throtl_drain, so let's remove it.Signed-off-by: Guoqing Jiang
Signed-off-by: Jens Axboe
08 Nov, 2019
2 commits
-
blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
cgroup1. Let's move it into its own files and gate it behind a config
option.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
When used on cgroup1, blk-throtl uses the blkg->stat_bytes and
->stat_ios from blk-cgroup core to populate four stat knobs.
blk-cgroup core is moving away from blkg_rwstat to improve scalability
and won't be able to support this usage.It isn't like the sharing gains all that much. Let's break them out
to dedicated rwstat counters which are updated when on cgroup1.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe
16 Sep, 2019
1 commit
-
Currently rq->data_len will be decreased by partial completion or
zeroed by completion, so when blk_stat_add() is invoked, data_len
will be zero and there will never be samples in poll_cb because
blk_mq_poll_stats_bkt() will return -1 if data_len is zero.We could move blk_stat_add() back to __blk_mq_complete_request(),
but that would make the effort of trying to call ktime_get_ns()
once in vain. Instead we can reuse throtl_size field, and use
it for both block stats and block throttle, and adjust the
logic in blk_mq_poll_stats_bkt() accordingly.Fixes: 4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()")
Tested-by: Pavel Begunkov
Signed-off-by: Hou Tao
Signed-off-by: Jens Axboe
29 Aug, 2019
1 commit
-
Instead of @node, pass in @q and @blkcg so that the alloc function has
more context. This doesn't cause any behavior change and will be used
by io.weight implementation.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe
10 Jul, 2019
1 commit
-
After commit 991f61fe7e1d ("Blk-throttle: reduce tail io latency when
iops limit is enforced") wait time could be zero even if group is
throttled and cannot issue requests right now. As a result
throtl_select_dispatch() turns into busy-loop under irq-safe queue
spinlock.Fix is simple: always round up target time to the next throttle slice.
Fixes: 991f61fe7e1d ("Blk-throttle: reduce tail io latency when iops limit is enforced")
Signed-off-by: Konstantin Khlebnikov
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe
01 Jun, 2019
1 commit
-
Commit e99e88a9d2b0 renamed a function argument without updating the
corresponding kernel-doc header. Update the kernel-doc header.Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Kees Cook
Fixes: e99e88a9d2b0 ("treewide: setup_timer() -> timer_setup()") # v4.15.
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe
08 Dec, 2018
4 commits
-
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Liu Bo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe -
Previously, blkg association was handled by controller specific code in
blk-throttle and blk-iolatency. However, because a blkg represents a
relationship between a blkcg and a request_queue, it makes sense to keep
the blkg->q and bio->bi_disk->queue consistent.This patch moves association into the bio_set_dev macro(). This should
cover the majority of cases where the device is set/changed keeping the
two pointers consistent. Fallback code is added to
blkcg_bio_issue_check() to catch any missing paths.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe -
There are 3 ways blkg association can happen: association with the
current css, with the page css (swap), or from the wbc css (writeback).This patch handles how association is done for the first case where we
are associating bsaed on the current css. If there is already a blkg
associated, the css will be reused and association will be redone as the
request_queue may have changed.Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe -
There are several scenarios where blkg_lookup_create() can fail such as
the blkcg dying, request_queue is dying, or simply being OOM. Most
handle this by simply falling back to the q->root_blkg and calling it a
day.This patch implements the notion of closest blkg. During
blkg_lookup_create(), if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest() is introduced and used
during association so a bio is always attached to a blkg.Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe
16 Nov, 2018
4 commits
-
Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.Where the ->mq_ops != NULL check is redundant, remove it.
Since mq == rq-based now that legacy is gone, get rid of the
queue_is_rq_based() and just use queue_is_mq() everywhere.Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
With the legacy request path gone there is no good reason to keep
queue_lock as a pointer, we can always use the embedded lock now.Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph HellwigFixed floppy and blk-cgroup missing conversions and half done edits.
Signed-off-by: Jens Axboe
-
The only remaining user unconditionally drops and reacquires the lock,
which means we really don't need any additional (conditional) annotation.Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Unused since the removal of the legacy request code.
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
02 Nov, 2018
1 commit
-
This reverts a series committed earlier due to null pointer exception
bug report in [1]. It seems there are edge case interactions that I did
not consider and will need some time to understand what causes the
adverse interactions.The original series can be found in [2] with a follow up series in [3].
[1] https://www.spinics.net/lists/cgroups/msg20719.html
[2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/This reverts the following commits:
d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae,
f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef,
a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe
22 Sep, 2018
2 commits
-
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Liu Bo
Signed-off-by: Jens Axboe -
Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.This patch begins the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg. The duplicate association logic is removed from blk-throttle and
blk-iolatency.Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe
21 Sep, 2018
1 commit
-
As rbtree has native support of caching leftmost node,
i.e. rb_root_cached, no need to do the caching by ourselves.Signed-off-by: Liu Bo
Signed-off-by: Jens Axboe
01 Sep, 2018
1 commit
-
There is a very small change a bio gets caught up in a really
unfortunate race between a task migration, cgroup exiting, and itself
trying to associate with a blkg. This is due to css offlining being
performed after the css->refcnt is killed which triggers removal of
blkgs that reach their blkg->refcnt of 0.To avoid this, association with a blkg should use tryget and fallback to
using the root_blkg.Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik
Signed-off-by: Dennis Zhou
Cc: Jiufei Xue
Cc: Joseph Qi
Cc: Tejun Heo
Cc: Josef Bacik
Cc: Jens Axboe
Signed-off-by: Jens Axboe
10 Aug, 2018
1 commit
-
When an application's iops has exceeded its cgroup's iops limit, surely it
is throttled and kernel will set a timer for dispatching, thus IO latency
includes the delay.However, the dispatch delay which is calculated by the limit and the
elapsed jiffies is suboptimal. As the dispatch delay is only calculated
once the application's iops is (iops limit + 1), it doesn't need to wait
any longer than the remaining time of the current slice.The difference can be proved by the following fio job and cgroup iops
setting,
-----
$ echo 4 > /mnt/config/nullb/disk1/mbps # limit nullb's bandwidth to 4MB/s for testing.
$ echo "253:1 riops=100 rbps=max" > /sys/fs/cgroup/unified/cg1/io.max
$ cat r2.job
[global]
name=fio-rand-read
filename=/dev/nullb1
rw=randread
bs=4k
direct=1
numjobs=1
time_based=1
runtime=60
group_reporting=1[file1]
size=4G
ioengine=libaio
iodepth=1
rate_iops=50000
norandommap=1
thinktime=4ms
-----wo patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 processread: IOPS=99, BW=400KiB/s (410kB/s)(23.4MiB/60001msec)
slat (usec): min=10, max=336, avg=27.71, stdev=17.82
clat (usec): min=2, max=28887, avg=5929.81, stdev=7374.29
lat (usec): min=24, max=28901, avg=5958.73, stdev=7366.22
clat percentiles (usec):
| 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 4], 40.00th=[ 4], 50.00th=[ 6], 60.00th=[11731],
| 70.00th=[11863], 80.00th=[11994], 90.00th=[12911], 95.00th=[22676],
| 99.00th=[23725], 99.50th=[23987], 99.90th=[23987], 99.95th=[25035],
| 99.99th=[28967]w/ patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 processread: IOPS=100, BW=400KiB/s (410kB/s)(23.4MiB/60005msec)
slat (usec): min=10, max=155, avg=23.24, stdev=16.79
clat (usec): min=2, max=12393, avg=5961.58, stdev=5959.25
lat (usec): min=23, max=12412, avg=5985.91, stdev=5951.92
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 4], 40.00th=[ 5], 50.00th=[ 47], 60.00th=[11863],
| 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994],
| 99.00th=[11994], 99.50th=[11994], 99.90th=[12125], 99.95th=[12125],
| 99.99th=[12387]Signed-off-by: Liu Bo
Signed-off-by: Jens Axboe
09 Jul, 2018
1 commit
-
Currently io.low uses a bi_cg_private to stash its private data for the
blkg, however other blkcg policies may want to use this as well. Since
we can get the private data out of the blkg, move this to bi_blkg in the
bio and make it generic, then we can use bio_associate_blkg() to attach
the blkg to the bio.Theoretically we could simply replace the bi_css with this since we can
get to all the same information from the blkg, however you have to
lookup the blkg, so for example wbc_init_bio() would have to lookup and
possibly allocate the blkg for the css it was trying to attach to the
bio. This could be problematic and result in us either not attaching
the css at all to the bio, or falling back to the root blkcg if we are
unable to allocate the corresponding blkg.So for now do this, and in the future if possible we could just replace
the bi_css with bi_blkg and update the helpers to do the correct
translation.Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe