Eric Lee / smarc-fsl-linux-kernel

01 Mar, 2013

1 commit

ee89f8125 Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO core bits from Jens Axboe:
"Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:

- The big cfq/blkcg update from Tejun and and Vivek.

- Additional block and writeback tracepoints from Tejun.

- Improvement of the should sort (based on queues) logic in the plug
flushing.

- _io() variants of the wait_for_completion() interface, using
io_schedule() instead of schedule() to contribute to io wait
properly.

- Various little fixes.

You'll get two trivial merge conflicts, which should be easy enough to
fix up"

Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42ca: "hlist: drop the node parameter from iterators").

* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
block: remove redundant check to bd_openers()
block: use i_size_write() in bd_set_size()
cfq: fix lock imbalance with failed allocations
drivers/block/swim3.c: fix null pointer dereference
block: don't select PERCPU_RWSEM
block: account iowait time when waiting for completion of IO request
sched: add wait_for_completion_io[_timeout]
writeback: add more tracepoints
block: add block_{touch|dirty}_buffer tracepoint
buffer: make touch_buffer() an exported function
block: add @req to bio_{front|back}_merge tracepoints
block: add missing block_bio_complete() tracepoint
block: Remove should_sort judgement when flush blk_plug
block,elevator: use new hashtable implementation
cfq-iosched: add hierarchical cfq_group statistics
cfq-iosched: collect stats from dead cfqgs
cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
block: RCU free request_queue
blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
...

Linus Torvalds
2013-03-01 04:52:24 +0800

28 Feb, 2013

1 commit

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800

22 Feb, 2013

1 commit

a3cc86c2f cfq: fix lock imbalance with failed allocations ... Browse Code »

While stress-running very-small container scenarios with the Kernel Memory
Controller, I've run into a lockdep-detected lock imbalance in
cfq-iosched.c.

I'll apologize beforehand for not posting a backlog: I didn't anticipate
it would be so hard to reproduce, so I didn't save my serial output and
went directly on debugging. Turns out that it did not happen again in
more than 20 runs, making it a quite rare pattern.

But here is my analysis:

When we are in very low-memory situations, we will arrive at
cfq_find_alloc_queue and may not find a queue, having to resort to the oom
queue, in an rcu-locked condition:

if (!cfqq || cfqq == &cfqd->oom_cfqq)
[ ... ]

Next, we will release the rcu lock, and try to allocate a queue, retrying
if we succeed:

rcu_read_unlock();
spin_unlock_irq(cfqd->queue->queue_lock);
new_cfqq = kmem_cache_alloc_node(cfq_pool,
gfp_mask | __GFP_ZERO,
cfqd->queue->node);
spin_lock_irq(cfqd->queue->queue_lock);
if (new_cfqq)
goto retry;

We are unlocked at this point, but it should be fine, since we will
reacquire the rcu_read_lock when we retry.

Except of course, that we may not retry: the allocation may very well fail
and we'll keep on going through the flow:

The next branch is:

if (cfqq) {
[ ... ]
} else
cfqq = &cfqd->oom_cfqq;

And right before exiting, we'll issue rcu_read_unlock().

Being already unlocked, this is the likely source of our imbalance. Since
cfqq is either already NULL or made NULL in the first statement of the
outter branch, the only viable alternative here seems to be to return the
oom queue right away in case of allocation failure.

Please review the following patch and apply if you agree with my analysis.

Signed-off-by: Glauber Costa
Cc: Jens Axboe
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Glauber Costa
2013-02-22 17:42:46 +0800

10 Jan, 2013

15 commits

43114018c cfq-iosched: add hierarchical cfq_group statistics ... Browse Code »

Unfortunately, at this point, there's no way to make the existing
statistics hierarchical without creating nasty surprises for the
existing users. Just create recursive counterpart of the existing
stats.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:13 +0800
0b39920b5 cfq-iosched: collect stats from dead cfqgs ... Browse Code »

To support hierarchical stats, it's necessary to remember stats from
dead children. Add cfqg->dead_stats and make a dying cfqg transfer
its stats to the parent's dead-stats.

The transfer happens form ->pd_offline_fn() and it is possible that
there are some residual IOs completing afterwards. Currently, we lose
these stats. Given that cgroup removal isn't a very high frequency
operation and the amount of residual IOs on offline are likely to be
nil or small, this shouldn't be a big deal and the complexity needed
to handle residual IOs - another callback and rather elaborate
synchronization to reach and lock the matching q - doesn't seem
justified.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:13 +0800
689665af4 cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() ... Browse Code »

Separate out cfqg_stats_reset() which takes struct cfqg_stats * from
cfq_pd_reset_stats() and move the latter to where other pd methods are
defined. cfqg_stats_reset() will be used to implement hierarchical
stats.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:13 +0800
4d5e80a76 blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/ ... Browse Code »

Rename blkg_rwstat_sum() to blkg_rwstat_total(). sum will be used for
summing up stats from multiple blkgs.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:12 +0800
d02f7aa8d cfq-iosched: enable full blkcg hierarchy support ... Browse Code »

With the previous two patches, all cfqg scheduling decisions are based
on vfraction and ready for hierarchy support. The only thing which
keeps the behavior flat is cfqg_flat_parent() which makes vfraction
calculation consider all non-root cfqgs children of the root cfqg.

Replace it with cfqg_parent() which returns the real parent. This
enables full blkcg hierarchy support for cfq-iosched. For example,
consider the following hierarchy.

root
/ \
A:500 B:250
/ \
AA:500 AB:1000

For simplicity, let's say all the leaf nodes have active tasks and are
on service tree. For each leaf node, vfraction would be

AA: (500 / 1500) * (500 / 750) =~ 0.2222
AB: (1000 / 1500) * (500 / 750) =~ 0.4444
B: (250 / 750) =~ 0.3333

and vdisktime will be distributed accordingly. For more detail,
please refer to Documentation/block/cfq-iosched.txt.

v2: cfq-iosched.txt updated to describe group scheduling as suggested
by Vivek.

v3: blkio-controller.txt updated.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:11 +0800
41cad6ab2 cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction ... Browse Code »

cfq_group_slice() calculates slice by taking a fraction of
cfq_target_latency according to the ratio of cfqg->weight against
service_tree->total_weight. This currently works only because all
cfqgs are treated to be at the same level.

To prepare for proper hierarchy support, convert cfq_group_slice() to
base the calculation on cfqg->vfraction. As cfqg->vfraction is always
a fraction of 1 and represents the fraction allocated to the cfqg with
hierarchy considered, the slice can be simply calculated by
multiplying cfqg->vfraction to cfq_target_latency (with fixed point
shift factored in).

As vfraction calculation currently treats all non-root cfqgs as
children of the root cfqg, this patch doesn't introduce noticeable
behavior difference.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:11 +0800
1d3650f71 cfq-iosched: implement hierarchy-ready cfq_group charge scaling ... Browse Code »

Currently, cfqg charges are scaled directly according to cfqg->weight.
Regardless of the number of active cfqgs or the amount of active
weights, a given weight value always scales charge the same way. This
works fine as long as all cfqgs are treated equally regardless of
their positions in the hierarchy, which is what cfq currently
implements. It can't work in hierarchical settings because the
interpretation of a given weight value depends on where the weight is
located in the hierarchy.

This patch reimplements cfqg charge scaling so that it can be used to
support hierarchy properly. The scheme is fairly simple and
light-weight.

* When a cfqg is added to the service tree, v(disktime)weight is
calculated. It walks up the tree to root calculating the fraction
it has in the hierarchy. At each level, the fraction can be
calculated as

cfqg->weight / parent->level_weight

By compounding these, the global fraction of vdisktime the cfqg has
claim to - vfraction - can be determined.

* When the cfqg needs to be charged, the charge is scaled inversely
proportionally to the vfraction.

The new scaling scheme uses the same CFQ_SERVICE_SHIFT for fixed point
representation as before; however, the smallest scaling factor is now
1 (ie. 1 << CFQ_SERVICE_SHIFT). This is different from before where 1
was for CFQ_WEIGHT_DEFAULT and higher weight would result in smaller
scaling factor.

While this shifts the global scale of vdisktime a bit, it doesn't
change the relative relationships among cfqgs and the scheduling
result isn't different.

cfq_group_notify_queue_add uses fixed CFQ_IDLE_DELAY when appending
new cfqg to the service tree. The specific value of CFQ_IDLE_DELAY
didn't have any relevance to vdisktime before and is unlikely to cause
any visible behavior difference now especially as the scale shift
isn't that large.

As the new scheme now makes proper distinction between cfqg->weight
and ->leaf_weight, reverse the weight aliasing for root cfqgs. For
root, both weights are now mapped to ->leaf_weight instead of the
other way around.

Because we're still using cfqg_flat_parent(), this patch shouldn't
change the scheduling behavior in any noticeable way.

v2: Beefed up comments on vfraction as requested by Vivek.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:11 +0800
7918ffb5b cfq-iosched: implement cfq_group->nr_active and ->children_weight ... Browse Code »

To prepare for blkcg hierarchy support, add cfqg->nr_active and
->children_weight. cfqg->nr_active counts the number of active cfqgs
at the cfqg's level and ->children_weight is sum of weights of those
cfqgs. The level covers itself (cfqg->leaf_weight) and immediate
children.

The two values are updated when a cfqg enters and leaves the group
service tree. Unless the hierarchy is very deep, the added overhead
should be negligible.

Currently, the parent is determined using cfqg_flat_parent() which
makes the root cfqg the parent of all other cfqgs. This is to make
the transition to hierarchy-aware scheduling gradual. Scheduling
logic will be converted to use cfqg->children_weight without actually
changing the behavior. When everything is ready,
blkcg_weight_parent() will be replaced with proper parent function.

This patch doesn't introduce any behavior chagne.

v2: s/cfqg->level_weight/cfqg->children_weight/ as per Vivek.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal

Tejun Heo
2013-01-10 00:05:11 +0800
e71357e11 cfq-iosched: add leaf_weight ... Browse Code »

cfq blkcg is about to grow proper hierarchy handling, where a child
blkg's weight would nest inside the parent's. This makes tasks in a
blkg to compete against both tasks in the sibling blkgs and the tasks
of child blkgs.

We're gonna use the existing weight as the group weight which decides
the blkg's weight against its siblings. This patch introduces a new
weight - leaf_weight - which decides the weight of a blkg against the
child blkgs.

It's named leaf_weight because another way to look at it is that each
internal blkg nodes have a hidden child leaf node which contains all
its tasks and leaf_weight is the weight of the leaf node and handled
the same as the weight of the child blkgs.

This patch only adds leaf_weight fields and exposes it to userland.
The new weight isn't actually used anywhere yet. Note that
cfq-iosched currently offcially supports only single level hierarchy
and root blkgs compete with the first level blkgs - ie. root weight is
basically being used as leaf_weight. For root blkgs, the two weights
are kept in sync for backward compatibility.

v2: cfqd->root_group->leaf_weight initialization was missing from
cfq_init_queue() causing divide by zero when
!CONFIG_CFQ_GROUP_SCHED. Fix it. Reported by Fengguang.

Signed-off-by: Tejun Heo
Cc: Fengguang Wu

Tejun Heo
2013-01-10 00:05:10 +0800
b226e5c41 cfq-iosched: Print sync-noidle information in blktrace messages ... Browse Code »

Currently we attach a character "S" or "A" to the cfqq, to represent
whether queues is sync or async. Add one more character "N" to represent
whether it is sync-noidle queue or sync queue. So now three different
type of queues will look as follows.

cfq1234S --> sync queus
cfq1234SN --> sync noidle queue
cfq1234A --> Async queue

Previously S/A classification was being printed only if group scheduling
was enabled. This patch also makes sure that this classification is
displayed even if group idling is disabled.

Signed-off-by: Vivek Goyal
Acked-by: Jeff Moyer
Signed-off-by: Tejun Heo

Vivek Goyal
2013-01-10 00:05:09 +0800
1f23f1215 cfq-iosched: Get rid of unnecessary local variable ... Browse Code »

Use of local varibale "n" seems to be unnecessary. Remove it. This brings
it inline with function __cfq_group_st_add(), which is also doing the
similar operation of adding a group to a rb tree.

No functionality change here.

Signed-off-by: Vivek Goyal
Acked-by: Jeff Moyer
Signed-off-by: Tejun Heo

Vivek Goyal
2013-01-10 00:05:09 +0800
6d816ec7c cfq-iosched: Rename few functions related to selecting workload ... Browse Code »

choose_service_tree() selects/sets both wl_class and wl_type. Rename it to
choose_wl_class_and_type() to make it very clear.

cfq_choose_wl() only selects and sets wl_type. It is easy to confuse
it with choose_st(). So rename it to cfq_choose_wl_type() to make
it clear what does it do.

Just renaming. No functionality change.

Signed-off-by: Vivek Goyal
Acked-by: Jeff Moyer
Signed-off-by: Tejun Heo

Vivek Goyal
2013-01-10 00:05:09 +0800
34b98d03b cfq-iosched: Rename "service_tree" to "st" at some places ... Browse Code »

At quite a few places we use the keyword "service_tree". At some places,
especially local variables, I have abbreviated it to "st".

Also at couple of places moved binary operator "+" from beginning of line
to end of previous line, as per Tejun's feedback.

v2:
Reverted most of the service tree name change based on Jeff Moyer's feedback.

Signed-off-by: Vivek Goyal
Signed-off-by: Tejun Heo

Vivek Goyal
2013-01-10 00:05:09 +0800
4d2ceea4c cfq-iosched: More renaming to better represent wl_class and wl_type ... Browse Code »

Some more renaming. Again making the code uniform w.r.t use of
wl_class/class to represent IO class (RT, BE, IDLE) and using
wl_type/type to represent subclass (SYNC, SYNC-IDLE, ASYNC).

At places this patch shortens the string "workload" to "wl".
Renamed "saved_workload" to "saved_wl_type". Renamed
"saved_serving_class" to "saved_wl_class".

For uniformity with "saved_wl_*" variables, renamed "serving_class"
to "serving_wl_class" and renamed "serving_type" to "serving_wl_type".

Again, just trying to improve upon code uniformity and improve
readability. No functional change.

v2:
- Restored the usage of keyword "service" based on Jeff Moyer's feedback.

Signed-off-by: Vivek Goyal
Signed-off-by: Tejun Heo

Vivek Goyal
2013-01-10 00:05:09 +0800
3bf10fea3 cfq-iosched: Properly name all references to IO class ... Browse Code »

Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we
are calling workloads belonging to these classes as "prio". This gets
very confusing as one starts to associate it with ioprio.

So this patch just does bunch of renaming so that reading code becomes
easier. All reference to RT, BE and IDLE workload are done using keyword
"class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made
using keyword "type".

This makes me feel much better while I am reading the code. There is no
functionality change due to this patch.

Signed-off-by: Vivek Goyal
Acked-by: Jeff Moyer
Acked-by: Tejun Heo
Signed-off-by: Tejun Heo

Vivek Goyal
2013-01-10 00:05:08 +0800

06 Nov, 2012

1 commit

3d106fba2 block CFQ: avoid moving request to different queue ... Browse Code »

request is queued in cfqq->fifo list. Looks it's possible we are moving a
request from one cfqq to another in request merge case. In such case, adjusting
the fifo list order doesn't make sense and is impossible if we don't iterate
the whole fifo list.

My test does hit one case the two cfqq are different, but didn't cause kernel
crash, maybe it's because fifo list isn't used frequently. Anyway, from the
code logic, this is buggy.

I thought we can re-enable the recusive merge logic after this is fixed.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2012-11-06 19:39:51 +0800

04 Jun, 2012

2 commits

ffea73fc7 block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED ... Browse Code »

cfq may be built w/ or w/o blkcg support depending on
CONFIG_CFQ_CGROUP_IOSCHED. If blkcg support is disabled, most of
related code is ifdef'd out but some part is left dangling -
blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register()
calls are made on it.

Feeding zero filled policy to blkcg_policy_register() is incorrect and
triggers the following WARN_ON() if CONFIG_BLK_CGROUP &&
!CONFIG_CFQ_GROUP_IOSCHED.

------------[ cut here ]------------
WARNING: at block/blk-cgroup.c:867
Modules linked in:
Modules linked in:
CPU: 3 Not tainted 3.4.0-09547-gfb21aff #1
Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8)
Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000
000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048
000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8
00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70
Krnl Code: 00000000003d76be: a7280001 lhi %r2,1
00000000003d76c2: a7f4ffdf brc 15,3d7680
#00000000003d76c6: a7f40001 brc 15,3d76c8
>00000000003d76ca: a7c8ffea lhi %r12,-22
00000000003d76ce: a7f4ffce brc 15,3d766a
00000000003d76d2: a7f40001 brc 15,3d76d4
00000000003d76d6: a7c80000 lhi %r12,0
00000000003d76da: a7f4ffc2 brc 15,3d765e
Call Trace:
([] initcall_debug+0x0/0x4)
[] cfq_init+0x62/0xd4
[] do_one_initcall+0x3a/0x170
[] kernel_init+0x214/0x2bc
[] kernel_thread_starter+0x6/0xc
[] kernel_thread_starter+0x0/0xc
no locks held by swapper/0/1.
Last Breaking-Event-Address:
[] blkcg_policy_register+0xc6/0xe0
---[ end trace b8ef4903fcbf9dd3 ]---

This patch fixes the problem by ensuring all blkcg support code is
inside CONFIG_CFQ_GROUP_IOSCHED.

* blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved
inside the first CONFIG_CFQ_GROUP_IOSCHED block. __maybe_unused is
dropped from blkcg_policy_cfq decl.

* blkcg_deactivate_poilcy() invocation is moved inside ifdef. This
also makes the activation logic match cfq_init_queue().

* All blkcg_policy_[un]register() invocations are moved inside ifdef.

Signed-off-by: Tejun Heo
Reported-by: Heiko Carstens
LKML-Reference:
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-04 16:02:29 +0800
fd7949564 block: fix return value on cfq_init() failure ... Browse Code »

cfq_init() would return zero after kmem cache creation failure. Fix
so that it returns -ENOMEM.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2012-06-04 16:01:38 +0800

01 May, 2012

1 commit

0b7877d4e Merge tag 'v3.4-rc5' into for-3.5/core ... Browse Code »

The core branch is behind driver commits that we want to build
on for 3.5, hence I'm pulling in a later -rc.

Linux 3.4-rc5

Conflicts:
Documentation/feature-removal-schedule.txt

Signed-off-by: Jens Axboe

Jens Axboe
2012-05-01 20:29:55 +0800

20 Apr, 2012

11 commits

f9fcc2d39 blkcg: collapse blkcg_policy_ops into blkcg_policy ... Browse Code »

There's no reason to keep blkcg_policy_ops separate. Collapse it into
blkcg_policy.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:17 +0800
f95a04afa blkcg: embed struct blkg_policy_data in policy specific data ... Browse Code »

Currently blkg_policy_data carries policy specific data as char flex
array instead of being embedded in policy specific data. This was
forced by oddities around blkg allocation which are all gone now.

This patch makes blkg_policy_data embedded in policy specific data -
throtl_grp and cfq_group so that it's more conventional and consistent
with how io_cq is handled.

* blkcg_policy->pdata_size is renamed to ->pd_size.

* Functions which used to take void *pdata now takes struct
blkg_policy_data *pd.

* blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg().

* Dummy struct blkg_policy_data definition added. Dummy
pdata_to_blkg() definition was unused and inconsistent with the
non-dummy version - correct dummy pd_to_blkg() added.

* throtl and cfq updated accordingly.

* As dummy blkg_to_pd/pd_to_blkg() are provided,
blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd. Moved outside
ifdef block.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:17 +0800
3c798398e blkcg: mass rename of blkcg API ... Browse Code »

During the recent blkcg cleanup, most of blkcg API has changed to such
extent that mass renaming wouldn't cause any noticeable pain. Take
the chance and cleanup the naming.

* Rename blkio_cgroup to blkcg.

* Drop blkio / blkiocg prefixes and consistently use blkcg.

* Rename blkio_group to blkcg_gq, which is consistent with io_cq but
keep the blkg prefix / variable name.

* Rename policy method type and field names to signify they're dealing
with policy data.

* Rename blkio_policy_type to blkcg_policy.

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:17 +0800
54e7ed12b blkcg: remove blkio_group->path[] ... Browse Code »

blkio_group->path[] stores the path of the associated cgroup and is
used only for debug messages. Just format the path from blkg->cgroup
when printing debug messages.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:16 +0800
3c96cb32d blkcg: drop stuff unused after per-queue policy activation update ... Browse Code »

* All_q_list is unused. Drop all_q_{mutex|list}.

* @for_root of blkg_lookup_create() is always %false when called from
outside blk-cgroup.c proper. Factor out __blkg_lookup_create() so
that it doesn't check whether @q is bypassing and use the
underscored version for the @for_root callsite.

* blkg_destroy_all() is used only from blkcg proper and @destroy_root
is always %true. Make it static and drop @destroy_root.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800
a2b1693ba blkcg: implement per-queue policy activation ... Browse Code »

All blkcg policies were assumed to be enabled on all request_queues.
Due to various implementation obstacles, during the recent blkcg core
updates, this was temporarily implemented as shooting down all !root
blkgs on elevator switch and policy [de]registration combined with
half-broken in-place root blkg updates. In addition to being buggy
and racy, this meant losing all blkcg configurations across those
events.

Now that blkcg is cleaned up enough, this patch replaces the temporary
implementation with proper per-queue policy activation. Each blkcg
policy should call the new blkcg_[de]activate_policy() to enable and
disable the policy on a specific queue. blkcg_activate_policy()
allocates and installs policy data for the policy for all existing
blkgs. blkcg_deactivate_policy() does the reverse. If a policy is
not enabled for a given queue, blkg printing / config functions skip
the respective blkg for the queue.

blkcg_activate_policy() also takes care of root blkg creation, and
cfq_init_queue() and blk_throtl_init() are updated accordingly.

This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
unnecessary. Dropped.

v2: cfq_init_queue() was returning uninitialized @ret on root_group
alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800
03d8e1114 blkcg: add request_queue->root_blkg ... Browse Code »

With per-queue policy activation, root blkg creation will be moved to
blkcg core. Add q->root_blkg in preparation. For blk-throtl, this
replaces throtl_data->root_tg; however, cfq needs to keep
cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800
da8b06626 blkcg: make blkg_conf_prep() take @pol and return with queue lock held ... Browse Code »

Add @pol to blkg_conf_prep() and let it return with queue lock held
(to be released by blkg_conf_finish()). Note that @pol isn't used
yet.

This is to prepare for per-queue policy activation and doesn't cause
any visible difference.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800
8bd435b30 blkcg: remove static policy ID enums ... Browse Code »

Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
@pol->plid dynamically on registration. The maximum number of blkcg
policies which can be registered at the same time is defined by
BLKCG_MAX_POLS constant added to include/linux/blkdev.h.

Note that blkio_policy_register() now may fail. Policy init functions
updated accordingly and unnecessary ifdefs removed from cfq_init().

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800
ec399347d blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() ... Browse Code »

The two functions were taking "enum blkio_policy_id plid". Make them
take "const struct blkio_policy_type *pol" instead.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800
f48ec1d78 cfq: fix build breakage & warnings ... Browse Code »

* CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c
compile failure when the config is disabled. Move it outside the
ifdef block.

* Dummy cfqg_stats_*() definitions were lacking inline modifiers
causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED. Add
them.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2012-04-20 16:06:06 +0800

02 Apr, 2012

7 commits

5bc4afb1e blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros ... Browse Code »

Now that all stat handling code lives in policy implementations,
there's no need to encode policy ID in cft->private.

* Export blkcg_prfill_[rw]stat() from blkcg, remove
blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which
use hard-code BLKIO_POLICY_PROP.

* Use cft->private for offset of the target field directly and drop
BLKCG_STAT_{PRIV|POL|OFF}().

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:45 +0800
d366e7ec4 blkcg: pass around pd->pdata instead of pd itself in prfill functions ... Browse Code »

Now that all conf and stat fields are moved into policy specific
blkio_policy_data->pdata areas, there's no reason to use
blkio_policy_data itself in prfill functions. Pass around @pd->pdata
instead of @pd.

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:44 +0800
3381cb8d2 blkcg: move blkio_group_conf->weight to cfq ... Browse Code »

blkio_group_conf->weight is owned by cfq and has no reason to be
defined in blkcg core. Replace it with cfq_group->dev_weight and let
conf setting functions directly set it. If dev_weight is zero, the
cfqg doesn't have device specific weight configured.

Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
blkio_cgroup->weight to blkio_cgroup->cfq_weight. We eventually want
per-policy storage in blkio_cgroup but just mark the ownership of the
field for now.

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:44 +0800
155fead9b blkcg: move blkio_group_stats to cfq-iosched.c ... Browse Code »

blkio_group_stats contains only fields used by cfq and has no reason
to be defined in blkcg core.

* Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats.

* blkg_policy_data->stats is replaced with cfq_group->stats.
blkg_prfill_[rw]stat() are updated to use offset against pd->pdata
instead.

* All related macros / functions are renamed so that they have cfqg_
prefix and the unnecessary @pol arguments are dropped.

* All stat functions now take cfq_group * instead of blkio_group *.

* lockdep assertion on queue lock dropped. Elevator runs under queue
lock by default. There isn't much to be gained by adding lockdep
assertions at stat function level.

* cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method
so that cfqg->stats can be reset.

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:44 +0800
41b38b6d5 blkcg: cfq doesn't need per-cpu dispatch stats ... Browse Code »

blkio_group_stats_cpu is used to count dispatch stats using per-cpu
counters. This is used by both blk-throtl and cfq-iosched but the
sharing is rather silly.

* cfq-iosched doesn't need per-cpu dispatch stats. cfq always updates
those stats while holding queue_lock.

* blk-throtl needs per-cpu dispatch stats but only service_bytes and
serviced. It doesn't make use of sectors.

This patch makes cfq add and use global stats for service_bytes,
serviced and sectors, removes per-cpu sectors counter and moves
per-cpu stat printing code to blk-throttle.c.

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:44 +0800
629ed0b10 blkcg: move statistics update code to policies ... Browse Code »

As with conf/stats file handling code, there's no reason for stat
update code to live in blkcg core with policies calling into update
them. The current organization is both inflexible and complex.

This patch moves stat update code to specific policies. All
blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP
stats are collapsed into their cfq_blkiocg_update_*_stats()
counterparts. blkiocg_update_dispatch_stats() is used by both
policies and duplicated as throtl_update_dispatch_stats() and
cfq_blkiocg_update_dispatch_stats(). This will be cleaned up later.

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:44 +0800
2ce4d50f9 cfq: collapse cfq.h into cfq-iosched.c ... Browse Code »

block/cfq.h contains some functions which interact with blkcg;
however, this is only part of it and cfq-iosched.c already has quite
some #ifdef CONFIG_CFQ_GROUP_IOSCHED. With conf/stat handling being
moved to specific policies, having these relay functions isolated in
cfq.h doesn't make much sense. Collapse cfq.h into cfq-iosched.c for
now. Let's split blkcg support properly later if necessary.

Signed-off-by: Tejun Heo

Tejun Heo
2012-04-02 05:38:43 +0800