Eric Lee / smarc-fsl-linux-kernel

01 May, 2019

1 commit

3dcf60bcb block: add SPDX tags to block layer files missing licensing information ... Browse Code »

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-01 06:12:03 +0800

21 Mar, 2019

1 commit

537d71b3f blkcg: Fix kernel-doc warnings ... Browse Code »

Avoid that the following warnings are reported when building with W=1:

block/blk-cgroup.c:1755: warning: Function parameter or member 'q' not described in 'blkcg_schedule_throttle'
block/blk-cgroup.c:1755: warning: Function parameter or member 'use_memdelay' not described in 'blkcg_schedule_throttle'
block/blk-cgroup.c:1779: warning: Function parameter or member 'blkg' not described in 'blkcg_add_delay'
block/blk-cgroup.c:1779: warning: Function parameter or member 'now' not described in 'blkcg_add_delay'
block/blk-cgroup.c:1779: warning: Function parameter or member 'delta' not described in 'blkcg_add_delay'

Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-03-21 04:39:09 +0800

10 Feb, 2019

1 commit

7585d5082 blk-cgroup: Fix doc related to blkcg_exit_queue ... Browse Code »

Since 4cf6324b17e9, a portion of function blk_cleanup_queue was moved to
a newly created function called blk_exit_queue, including the call of
blkcg_exit_queue. So, adjust the documenation according.

Reviewed-by: Bart Van Assche
Signed-off-by: Marcos Paulo de Souza
Signed-off-by: Jens Axboe

Marcos Paulo de Souza
2019-02-10 23:24:08 +0800

21 Dec, 2018

1 commit

6b4505352 blkcg: remove unused __blkg_release_rcu() ... Browse Code »

An earlier commit 7fcf2b033b84 ("blkcg: change blkg reference counting
to use percpu_ref") moved around the release call from blkg_put() to be
a part of the percpu_ref cleanup. Remove the additional unused code
which should have been removed earlier.

Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-21 23:47:58 +0800

20 Dec, 2018

1 commit

3a762de55 block: save irq state in blkg_lookup_create() ... Browse Code »

blkg_lookup_create() may be called from pool_map() in which
irq state is saved, so we have to do that in blkg_lookup_create().

Otherwise, the following lockdep warning can be triggered:

[ 104.258537] ================================
[ 104.259129] WARNING: inconsistent lock state
[ 104.259725] 4.20.0-rc6+ #545 Not tainted
[ 104.260268] --------------------------------
[ 104.260865] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 104.261727] swapper/49/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 104.262444] 00000000db365b5d (&(&pool->lock)->rlock#3){+.?.}, at: thin_endio+0xcf/0x2a3 [dm_thin_pool]
[ 104.263747] {SOFTIRQ-ON-W} state was registered at:
[ 104.264417] _raw_spin_unlock_irq+0x29/0x4c
[ 104.265014] blkg_lookup_create+0xdc/0xe6
[ 104.265609] bio_associate_blkg_from_css+0xd3/0x13f
[ 104.266312] bio_associate_blkg+0x15a/0x1bb
[ 104.266913] pool_map+0xe8/0x103 [dm_thin_pool]
[ 104.267572] __map_bio+0x98/0x29c [dm_mod]
[ 104.268162] __split_and_process_non_flush+0x29e/0x306 [dm_mod]
[ 104.269003] __split_and_process_bio+0x16a/0x25b [dm_mod]
[ 104.269971] __dm_make_request.isra.14+0xdc/0x124 [dm_mod]
[ 104.270973] generic_make_request+0x3f5/0x68b
[ 104.271676] process_prepared_mapping+0x166/0x1ef [dm_thin_pool]
[ 104.272531] schedule_zero+0x239/0x273 [dm_thin_pool]
[ 104.273245] process_cell+0x60c/0x6f1 [dm_thin_pool]
[ 104.273967] do_worker+0x60c/0xca8 [dm_thin_pool]
[ 104.274635] process_one_work+0x4eb/0x834
[ 104.275203] worker_thread+0x318/0x484
[ 104.275740] kthread+0x1d1/0x1e1
[ 104.276203] ret_from_fork+0x3a/0x50
[ 104.276714] irq event stamp: 170003
[ 104.277201] hardirqs last enabled at (170002): [] _raw_spin_unlock_irqrestore+0x44/0x6b
[ 104.278535] hardirqs last disabled at (170003): [] _raw_spin_lock_irqsave+0x20/0x55
[ 104.280273] softirqs last enabled at (169978): [] irq_enter+0x4c/0x73
[ 104.281617] softirqs last disabled at (169979): [] irq_exit+0x7e/0x11d
[ 104.282744]
[ 104.282744] other info that might help us debug this:
[ 104.283640] Possible unsafe locking scenario:
[ 104.283640]
[ 104.284452] CPU0
[ 104.284803] ----
[ 104.285150] lock(&(&pool->lock)->rlock#3);
[ 104.285762]
[ 104.286130] lock(&(&pool->lock)->rlock#3);
[ 104.286750]
[ 104.286750] *** DEADLOCK ***
[ 104.286750]
[ 104.287564] no locks held by swapper/49/0.
[ 104.288129]
[ 104.288129] stack backtrace:
[ 104.288738] CPU: 49 PID: 0 Comm: swapper/49 Not tainted 4.20.0-rc6+ #545
[ 104.289700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[ 104.290858] Call Trace:
[ 104.291204]
[ 104.291502] dump_stack+0x9a/0xe6
[ 104.291968] mark_lock+0x56c/0x7a6
[ 104.292442] ? check_usage_backwards+0x209/0x209
[ 104.293086] __lock_acquire+0x400/0x15bf
[ 104.293662] ? check_chain_key+0x150/0x1aa
[ 104.294236] lock_acquire+0x1a6/0x1e3
[ 104.294768] ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
[ 104.295444] ? _raw_spin_unlock_irqrestore+0x44/0x6b
[ 104.296143] ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
[ 104.297031] _raw_spin_lock_irqsave+0x46/0x55
[ 104.297659] ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
[ 104.298335] thin_endio+0xcf/0x2a3 [dm_thin_pool]
[ 104.298997] ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
[ 104.299886] ? check_flags+0x20a/0x20a
[ 104.300408] ? lock_acquire+0x1a6/0x1e3
[ 104.300954] ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
[ 104.301865] clone_endio+0x1bb/0x22d [dm_mod]
[ 104.302491] ? disable_write_zeroes+0x20/0x20 [dm_mod]
[ 104.303200] ? bio_disassociate_blkg+0xc6/0x15f
[ 104.303836] ? bio_endio+0x2b2/0x2da
[ 104.304349] clone_endio+0x1f3/0x22d [dm_mod]
[ 104.304978] ? disable_write_zeroes+0x20/0x20 [dm_mod]
[ 104.305709] ? bio_disassociate_blkg+0xc6/0x15f
[ 104.306333] ? bio_endio+0x2b2/0x2da
[ 104.306853] clone_endio+0x1f3/0x22d [dm_mod]
[ 104.307476] ? disable_write_zeroes+0x20/0x20 [dm_mod]
[ 104.308185] ? bio_disassociate_blkg+0xc6/0x15f
[ 104.308817] ? bio_endio+0x2b2/0x2da
[ 104.309319] blk_update_request+0x2de/0x4cc
[ 104.309927] blk_mq_end_request+0x2a/0x183
[ 104.310498] blk_done_softirq+0x16a/0x1a6
[ 104.311051] ? blk_softirq_cpu_dead+0xe2/0xe2
[ 104.311653] ? __lock_is_held+0x2a/0x87
[ 104.312186] __do_softirq+0x250/0x4e8
[ 104.312705] irq_exit+0x7e/0x11d
[ 104.313157] call_function_single_interrupt+0xf/0x20
[ 104.313860]
[ 104.314163] RIP: 0010:native_safe_halt+0x2/0x3
[ 104.314792] Code: 63 02 df f0 83 44 24 fc 00 48 89 df e8 cc 3f 7a ff 48 8b 03 a8 08 74 0b 65 81 25 9d 31 45 7e ff ff ff 7f 5b 5d 41 5c c3 fb f4 f4 c3 0f 1f 44 00 00 41 56 41 55 41 54 55 53 e8 a2 0d 5c ff e8
[ 104.317339] RSP: 0018:ffff888106c9fdc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[ 104.318390] RAX: 1ffff11020d92100 RBX: 0000000000000000 RCX: ffffffff81159ac7
[ 104.319366] RDX: 1ffffffff05d5e69 RSI: 0000000000000007 RDI: ffff888106c90d1c
[ 104.320339] RBP: 0000000000000000 R08: dffffc0000000000 R09: 0000000000000001
[ 104.321313] R10: ffffed1025d57ba0 R11: ffffed1025d57b9f R12: 1ffff11020d93fbf
[ 104.322328] R13: 0000000000000031 R14: ffff888106c90040 R15: 0000000000000000
[ 104.323307] ? lockdep_hardirqs_on+0x26b/0x278
[ 104.323927] default_idle+0xd9/0x1a8
[ 104.324427] do_idle+0x162/0x2b2
[ 104.324891] ? arch_cpu_idle_exit+0x28/0x28
[ 104.325467] ? mark_held_locks+0x28/0x7f
[ 104.326031] ? _raw_spin_unlock_irqrestore+0x44/0x6b
[ 104.326719] cpu_startup_entry+0x1d/0x1f
[ 104.327261] start_secondary+0x2cb/0x308
[ 104.327806] ? set_cpu_sibling_map+0x8a3/0x8a3
[ 104.328421] secondary_startup_64+0xa4/0xb0

Fixes: b978962ad4f7f9 ("blkcg: update blkg_lookup_create() to do locking")
Cc: Mike Snitzer
Cc: Dennis Zhou
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-12-20 00:35:45 +0800

13 Dec, 2018

1 commit

0273ac349 blkcg: handle dying request_queue when associating a blkg ... Browse Code »

Between v3 [1] and v4 [2] of the blkg association series, the
association point moved from generic_make_request_checks(), which is
called after the request enters the queue, to bio_set_dev(), which is when
the bio is formed before submit_bio(). When the request_queue goes away,
the blkgs supporting the request_queue are destroyed and then the
q->root_blkg is set to %NULL.

This patch adds a %NULL check to blkg_tryget_closest() to prevent the
NPE caused by the above. It also adds a guard to see if the
request_queue is dying when creating a blkg to prevent creating a blkg
for a dead request_queue.

[1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/

Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device")
Reported-and-tested-by: Ming Lei
Reviewed-by: Bart Van Assche
Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-13 08:43:33 +0800

08 Dec, 2018

4 commits

7754f669f blkcg: rename blkg_try_get() to blkg_tryget() ... Browse Code »

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or %NULL.

Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-08 13:26:38 +0800
7fcf2b033 blkcg: change blkg reference counting to use percpu_ref ... Browse Code »

Every bio is now associated with a blkg putting blkg_get, blkg_try_get,
and blkg_put on the hot path. Switch over the refcnt in blkg to use
percpu_ref.

Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-08 13:26:38 +0800
beea9da07 blkcg: convert blkg_lookup_create() to find closest blkg ... Browse Code »

There are several scenarios where blkg_lookup_create() can fail such as
the blkcg dying, request_queue is dying, or simply being OOM. Most
handle this by simply falling back to the q->root_blkg and calling it a
day.

This patch implements the notion of closest blkg. During
blkg_lookup_create(), if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest() is introduced and used
during association so a bio is always attached to a blkg.

Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-08 13:26:36 +0800
b978962ad blkcg: update blkg_lookup_create() to do locking ... Browse Code »

To know when to create a blkg, the general pattern is to do a
blkg_lookup() and if that fails, lock and do the lookup again, and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create() to do locking and implement this
pattern. The old blkg_lookup_create() is renamed to
__blkg_lookup_create(). If a call site wants to do its own error
handling or already owns the queue lock, they can use
__blkg_lookup_create(). This will be used in upcoming patches.

Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Reviewed-by: Liu Bo
Signed-off-by: Jens Axboe

Dennis Zhou
2018-12-08 13:26:36 +0800

16 Nov, 2018

6 commits

344e9ffcb block: add queue_is_mq() helper ... Browse Code »

Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.

Where the ->mq_ops != NULL check is redundant, remove it.

Since mq == rq-based now that legacy is gone, get rid of the
queue_is_rq_based() and just use queue_is_mq() everywhere.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-16 23:34:06 +0800
0d945c1f9 block: remove the queue_lock indirection ... Browse Code »

With the legacy request path gone there is no good reason to keep
queue_lock as a pointer, we can always use the embedded lock now.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig

Fixed floppy and blk-cgroup missing conversions and half done edits.

Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:17:28 +0800
7fb1763de blk-cgroup: move locking into blkg_destroy_all ... Browse Code »

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:28 +0800
04be60b5e blk-cgroup: consolidate error handling in blkcg_init_queue ... Browse Code »

Use a goto label to merge two identical pieces of error handling code.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:26 +0800
b6676f653 block: remove a few unused exports ... Browse Code »

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:25 +0800
8f4236d90 block: remove QUEUE_FLAG_BYPASS and ->bypass ... Browse Code »

Unused since the removal of the legacy request code.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:15 +0800

08 Nov, 2018

2 commits

db6d99523 block: remove request_list code ... Browse Code »

It's now dead code, nobody uses it.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
2cdf2caec blk-cgroup: remove legacy queue bypassing ... Browse Code »

We only support mq devices now.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800

02 Nov, 2018

1 commit

b5f2954d3 blkcg: revert blkcg cleanups series ... Browse Code »

This reverts a series committed earlier due to null pointer exception
bug report in [1]. It seems there are edge case interactions that I did
not consider and will need some time to understand what causes the
adverse interactions.

The original series can be found in [2] with a follow up series in [3].

[1] https://www.spinics.net/lists/cgroups/msg20719.html
[2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

This reverts the following commits:
d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae,
f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef,
a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53

Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe

Dennis Zhou
2018-11-02 09:59:53 +0800

01 Oct, 2018

1 commit

c0aac682f Merge tag 'v4.19-rc6' into for-4.20/block ... Browse Code »

Merge -rc6 in, for two reasons:

1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
they aren't in the 4.20 branch.

Signed-off-by: Jens Axboe

* tag 'v4.19-rc6': (780 commits)
Linux 4.19-rc6
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
cpufreq: qcom-kryo: Fix section annotations
perf/core: Add sanity check to deal with pinned event failure
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
selftests/powerpc: Fix Makefiles for headers_install change
blk-mq: I/O and timer unplugs are inverted in blktrace
dax: Fix deadlock in dax_lock_mapping_entry()
x86/boot: Fix kexec booting failure in the SEV bit detection code
bcache: add separate workqueue for journal_write to avoid deadlock
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
xen/blkfront: When purging persistent grants, keep them in the buffer
clocksource/drivers/timer-atmel-pit: Properly handle error cases
block: fix deadline elevator drain for zoned block devices
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
...

Signed-off-by: Jens Axboe

Jens Axboe
2018-10-01 22:58:57 +0800

22 Sep, 2018

4 commits

101246ec0 blkcg: rename blkg_try_get to blkg_tryget ... Browse Code »

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.

Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-22 10:29:19 +0800
b3b9f24f5 blkcg: change blkg reference counting to use percpu_ref ... Browse Code »

Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.

Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-22 10:29:18 +0800
07b05bcc3 blkcg: convert blkg_lookup_create to find closest blkg ... Browse Code »

There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.

Acked-by: Tejun Heo
Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-22 10:29:05 +0800
49f4c2dc2 blkcg: update blkg_lookup_create to do locking ... Browse Code »

To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.

Signed-off-by: Dennis Zhou
Reviewed-by: Josef Bacik
Acked-by: Tejun Heo
Reviewed-by: Liu Bo
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-22 10:29:03 +0800

12 Sep, 2018

1 commit

01c5f85ae blk-cgroup: increase number of supported policies ... Browse Code »

After merging the iolatency policy, we potentially now have 4 policies
being registered, but only support 3. This causes one of them to fail
loading. Takashi reports that BFQ no longer works for him, because it
fails to load due to policy registration failure.

Bump to 5 policies, and also add a warning for when we have exceeded
the global amount. If we have to touch this again, we should switch
to a dynamic scheme instead.

Reported-by: Takashi Iwai
Reviewed-by: Jeff Moyer
Tested-by: Takashi Iwai
Signed-off-by: Jens Axboe

Jens Axboe
2018-09-12 00:59:53 +0800

01 Sep, 2018

2 commits

59b57717f blkcg: delay blkg destruction until after writeback has finished ... Browse Code »

Currently, blkcg destruction relies on a sequence of events:
1. Destruction starts. blkcg_css_offline() is called and blkgs
release their reference to the blkcg. This immediately destroys
the cgwbs (writeback).
2. With blkgs giving up their reference, the blkcg ref count should
become zero and eventually call blkcg_css_free() which finally
frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
1. Destruction starts. blkcg_css_offline() is called which offlines
writeback. Blkg destruction is delayed on the cgwb_refcnt count to
avoid punting potentially large amounts of outstanding writeback
to root while maintaining any ongoing policies. Here, the base
cgwb_refcnt is put back.
2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
and handles destruction of blkgs. This is where the css reference
held by each blkg is released.
3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik
Signed-off-by: Dennis Zhou
Cc: Jiufei Xue
Cc: Joseph Qi
Cc: Tejun Heo
Cc: Josef Bacik
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-01 04:48:56 +0800
6b0654620 Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" ... Browse Code »

This reverts commit 4c6994806f708559c2812b73501406e21ae5dcd0.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c6994806f70) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c6994806f70 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: Josef Bacik
Signed-off-by: Dennis Zhou
Cc: Jiufei Xue
Cc: Joseph Qi
Cc: Tejun Heo
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-09-01 04:48:54 +0800

01 Aug, 2018

1 commit

cc7ecc258 blk-cgroup: hold the queue ref during throttling ... Browse Code »

The blkg lifetime is protected by the queue lifetime, so we need to put
the queue _after_ we're done using the blkg.

Signed-off-by: Josef Bacik
Signed-off-by: Jens Axboe

Josef Bacik
2018-08-01 23:16:03 +0800

18 Jul, 2018

1 commit

636620b66 blkcg: Track DISCARD statistics and output them in cgroup io.stat ... Browse Code »

Add tracking of REQ_OP_DISCARD ios to the per-cgroup io.stat. Two
fields, dbytes and dios, to respectively count the total bytes and
number of discards are added.

Signed-off-by: Tejun Heo
Cc: Andy Newell
Cc: Michael Callahan
Signed-off-by: Jens Axboe

Tejun Heo
2018-07-18 22:44:23 +0800

09 Jul, 2018

3 commits

d70675121 block: introduce blk-iolatency io controller ... Browse Code »

Current IO controllers for the block layer are less than ideal for our
use case. The io.max controller is great at hard limiting, but it is
not work conserving. This patch introduces io.latency. You provide a
latency target for your group and we monitor the io in short windows to
make sure we are not exceeding those latency targets. This makes use of
the rq-qos infrastructure and works much like the wbt stuff. There are
a few differences from wbt

- It's bio based, so the latency covers the whole block layer in addition to
the actual io.
- We will throttle all IO types that comes in here if we need to.
- We use the mean latency over the 100ms window. This is because writes can
be particularly fast, which could give us a false sense of the impact of
other workloads on our protected workload.
- By default there's no throttling, we set the queue_depth to INT_MAX so that
we can have as many outstanding bio's as we're allowed to. Only at
throttle time do we pay attention to the actual queue depth.
- We backcharge cgroups for root cg issued IO and induce artificial
delays in order to deal with cases like metadata only or swap heavy
workloads.

In testing this has worked out relatively well. Protected workloads
will throttle noisy workloads down to 1 io at time if they are doing
normal IO on their own, or induce up to a 1 second delay per syscall if
they are doing a lot of root issued IO (metadata/swap IO).

Our testing has revolved mostly around our production web servers where
we have hhvm (the web server application) in a protected group and
everything else in another group. We see slightly higher requests per
second (RPS) on the test tier vs the control tier, and much more stable
RPS across all machines in the test tier vs the control tier.

Another test we run is a slow memory allocator in the unprotected group.
Before this would eventually push us into swap and cause the whole box
to die and not recover at all. With these patches we see slight RPS
drops (usually 10-15%) before the memory consumer is properly killed and
things recover within seconds.

Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Josef Bacik
2018-07-09 23:07:54 +0800
d09d8df3a blkcg: add generic throttling mechanism ... Browse Code »

Since IO can be issued from literally anywhere it's almost impossible to
do throttling without having some sort of adverse effect somewhere else
in the system because of locking or other dependencies. The best way to
solve this is to do the throttling when we know we aren't holding any
other kernel resources. Do this by tracking throttling in a per-blkg
basis, and if we require throttling flag the task that it needs to check
before it returns to user space and possibly sleep there.

This is to address the case where a process is doing work that is
generating IO that can't be throttled, whether that is directly with a
lot of REQ_META IO, or indirectly by allocating so much memory that it
is swamping the disk with REQ_SWAP. We can't use task_add_work as we
don't want to induce a memory allocation in the IO path, so simply
saving the request queue in the task and flagging it to do the
notify_resume thing achieves the same result without the overhead of a
memory allocation.

Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Josef Bacik
2018-07-09 23:07:54 +0800
903d23f0a blk-cgroup: allow controllers to output their own stats ... Browse Code »

blk-iolatency has a few stats that it would like to print out, and
instead of adding a bunch of crap to the generic code just provide a
helper so that controllers can add stuff to the stat line if they want
to.

Hide it behind a boot option since it changes the output of io.stat from
normal, and these stats are only interesting to developers.

Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Josef Bacik
2018-07-09 23:07:54 +0800

19 Apr, 2018

2 commits

901932a3f blkcg: init root blkcg_gq under lock ... Browse Code »

The initializing of q->root_blkg is currently outside of queue lock
and rcu, so the blkg may be destroied before the initializing, which
may cause dangling/null references. On the other side, the destroys
of blkg are protected by queue lock or rcu. Put the initializing
inside the queue lock and rcu to make it safer.

Signed-off-by: Jiang Biao
Signed-off-by: Wen Yang
CC: Tejun Heo
CC: Jens Axboe
Signed-off-by: Jens Axboe

Jiang Biao
2018-04-19 22:51:59 +0800
bea548831 blkcg: small fix on comment in blkcg_init_queue ... Browse Code »

The comment before blkg_create() in blkcg_init_queue() was moved
from blkcg_activate_policy() by commit ec13b1d6f0a0457312e615, but
it does not suit for the new context.

Signed-off-by: Jiang Biao
Signed-off-by: Wen Yang
CC: Tejun Heo
CC: Jens Axboe
Signed-off-by: Jens Axboe

Jiang Biao
2018-04-19 22:51:57 +0800

18 Apr, 2018

1 commit

946b81da1 blkcg: don't hold blkcg lock when deactivating policy ... Browse Code »

As described in the comment of blkcg_activate_policy(),
*Update of each blkg is protected by both queue and blkcg locks so
that holding either lock and testing blkcg_policy_enabled() is
always enough for dereferencing policy data.*
with queue lock held, there is no need to hold blkcg lock in
blkcg_deactivate_policy(). Similar case is in
blkcg_activate_policy(), which has removed holding of blkcg lock in
commit 4c55f4f9ad3001ac1fefdd8d8ca7641d18558e23.

Signed-off-by: Jiang Biao
Signed-off-by: Wen Yang
CC: Tejun Heo
Signed-off-by: Jens Axboe

Jiang Biao
2018-04-18 22:37:18 +0800

17 Mar, 2018

1 commit

4c6994806 blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir() ... Browse Code »

We've triggered a WARNING in blk_throtl_bio() when throttling writeback
io, which complains blkg->refcnt is already 0 when calling blkg_get(),
and then kernel crashes with invalid page request.
After investigating this issue, we've found it is caused by a race
between blkcg_bio_issue_check() and cgroup_rmdir(), which is described
below:

writeback kworker cgroup_rmdir
cgroup_destroy_locked
kill_css
css_killed_ref_fn
css_killed_work_fn
offline_css
blkcg_css_offline
blkcg_bio_issue_check
rcu_read_lock
blkg_lookup
spin_trylock(q->queue_lock)
blkg_destroy
spin_unlock(q->queue_lock)
blk_throtl_bio
spin_lock_irq(q->queue_lock)
...
spin_unlock_irq(q->queue_lock)
rcu_read_unlock

Since rcu can only prevent blkg from releasing when it is being used,
the blkg->refcnt can be decreased to 0 during blkg_destroy() and schedule
blkg release.
Then trying to blkg_get() in blk_throtl_bio() will complains the WARNING.
And then the corresponding blkg_put() will schedule blkg release again,
which result in double free.
This race is introduced by commit ae1188963611 ("blkcg: consolidate blkg
creation in blkcg_bio_issue_check()"). Before this commit, it will
lookup first and then try to lookup/create again with queue_lock. Since
revive this logic is a bit drastic, so fix it by only offlining pd during
blkcg_css_offline(), and move the rest destruction (especially
blkg_put()) into blkcg_css_free(), which should be the right way as
discussed.

Fixes: ae1188963611 ("blkcg: consolidate blkg creation in blkcg_bio_issue_check()")
Reported-by: Jiufei Xue
Signed-off-by: Joseph Qi
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Joseph Qi
2018-03-17 00:35:12 +0800

27 Feb, 2018

1 commit

9df6c2991 genhd: Add helper put_disk_and_module() ... Browse Code »

Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.

Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2018-02-27 00:48:42 +0800

05 Nov, 2017

1 commit

e84010732 blkcg: add sanity check for blkcg policy operations ... Browse Code »

blkcg policy should keep cpd/pd's alloc_fn and free_fn in pairs,
otherwise policy would register fail.

Reviewed-by: Johannes Thumshirn
Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2017-11-05 02:31:15 +0800

10 Oct, 2017

1 commit

58a9edce0 blkcg: check pol->cpd_free_fn before free cpd ... Browse Code »

check pol->cpd_free_fn() instead of pol->cpd_alloc_fn() when free cpd.

Reviewed-by: Johannes Thumshirn
Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2017-10-10 23:04:47 +0800

26 Aug, 2017

1 commit

4c18c9e96 blkcg: avoid free blkcg_root when failed to alloc blkcg policy ... Browse Code »

this patch fix two errors, firstly avoid kfree blk_root, secondly not
free(blkcg) ,if blkcg alloc fail(blkcg == NULL), just unlock that mutex;

Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2017-08-26 03:51:07 +0800