Eric Lee / smarc-fsl-linux-kernel

13 Dec, 2011

1 commit

bb9d97b6d cgroup: don't use subsys->can_attach_task() or ->attach_task() ... Browse Code »

Now that subsys->can_attach() and attach() take @tset instead of
@task, they can handle per-task operations. Convert
->can_attach_task() and ->attach_task() users to use ->can_attach()
and attach() instead. Most converions are straight-forward.
Noteworthy changes are,

* In cgroup_freezer, remove unnecessary NULL assignments to unused
methods. It's useless and very prone to get out of sync, which
already happened.

* In cpuset, PF_THREAD_BOUND test is checked for each task. This
doesn't make any practical difference but is conceptually cleaner.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Frederic Weisbecker
Acked-by: Li Zefan
Cc: Paul Menage
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: James Morris
Cc: Ingo Molnar
Cc: Peter Zijlstra

Tejun Heo
2011-12-13 10:12:21 +0800

25 Oct, 2011

2 commits

a38eb630f blk-throttle: Take blkcg->lock while traversing blkcg->policy_list ... Browse Code »

blkcg->policy_list is protected by blkcg->lock. Its not rcu protected
list. So even for readers, they need to take blkcg->lock. There are
few functions which were reading the list without taking lock. Fix it.

Signed-off-by: Vivek Goyal
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Vivek Goyal
2011-10-25 21:48:12 +0800
e060f00be blk-throttle: Free up policy node associated with deleted rule ... Browse Code »

If a rule is being deleted, free up associated policy node. Otherwise
that memory is leaked.

Signed-off-by: Vivek Goyal
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Vivek Goyal
2011-10-25 21:48:12 +0800

19 Oct, 2011

1 commit

ece84241b block: fix genhd refcounting in blkio_policy_parse_and_set() ... Browse Code »

blkio_policy_parse_and_set() calls blkio_check_dev_num() to check
whether the given dev_t is valid. blkio_check_dev_num() uses
get_gendisk() for verification but never puts the returned genhd
leaking the reference.

This patch collapses blkio_check_dev_num() into its caller and updates
it such that the genhd is put before returning.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-10-19 20:31:15 +0800

21 Sep, 2011

1 commit

d11bb4462 blk-cgroup: be able to remove the record of unplugged device ... Browse Code »
1

The bug is we're not able to remove the device from blkio cgroup's
per-device control files if it gets unplugged.

To reproduce the bug:

# mount -t cgroup -o blkio xxx /cgroup
# cd /cgroup
# echo "8:0 1000" > blkio.throttle.read_bps_device
# unplug the device
# cat blkio.throttle.read_bps_device
8:0 1000
# echo "8:0 0" > blkio.throttle.read_bps_device
-bash: echo: write error: No such device

After patching, the device removal will succeed.

Thanks for the comments of Paul, Zefan, and Vivek.

Signed-off-by: Wanlong Gao
Cc: Li Zefan
Cc: Paul Menage
Acked-by: Vivek Goyal
Cc: Jens Axboe
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Wanlong Gao
2011-09-21 16:22:10 +0800

27 May, 2011

1 commit

f780bdb7c cgroups: add per-thread subsystem callbacks ... Browse Code »

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface. Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this. All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.

Signed-off-by: Ben Blum
Cc: "Eric W. Biederman"
Cc: Li Zefan
Cc: Matt Helsley
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Miao Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2011-05-27 08:12:34 +0800

23 May, 2011

1 commit

317389a77 cfq-iosched: Make IO merge related stats per cpu ... Browse Code »

Make BLKIO_STAT_MERGED per cpu hence gettring rid of need of taking
blkg->stats_lock.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-23 16:02:19 +0800

21 May, 2011

4 commits

f0bdc8cdd blk-cgroup: Make cgroup stat reset path blkg->lock free for dispatch stats ... Browse Code »

Now dispatch stats update is lock free. But reset of these stats still
takes blkg->stats_lock and is dependent on that. As stats are per cpu,
we should be able to just reset the stats on each cpu without any locks.
(Atleast for 64bit arch).

On 32bit arch there is a small race where 64bit updates are not atomic.
The result of this race can be that in the presence of other writers,
one might not get 0 value after reset of a stat and might see something
intermediate

One can write more complicated code to cover this race like sending IPI
to other cpus to reset stats and for offline cpus, reset these directly.

Right not I am not taking that path because reset_update is more of a
debug feature and it can happen only on 32bit arch and possibility of
it happening is small. Will fix it if it becomes a real problem. For
the time being going for code simplicity.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:53 +0800
575969a0d blk-cgroup: Make 64bit per cpu stats safe on 32bit arch ... Browse Code »

Some of the stats are 64bit and updation will be non atomic on 32bit
architecture. Use sequence counters on 32bit arch to make reading
of stats safe.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:53 +0800
5624a4e44 blk-throttle: Make dispatch stats per cpu ... Browse Code »

Currently we take blkg_stat lock for even updating the stats. So even if
a group has no throttling rules (common case for root group), we end
up taking blkg_lock, for updating the stats.

Make dispatch stats per cpu so that these can be updated without taking
blkg lock.

If cpu goes offline, these stats simply disappear. No protection has
been provided for that yet. Do we really need anything for that?

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:52 +0800
a23e68695 blk-cgroup: move some fields of unaccounted_time file under right config option ... Browse Code »

cgroup unaccounted_time file is created only if CONFIG_DEBUG_BLK_CGROUP=y.
there are some fields which are out side this config option. Fix that.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:52 +0800

16 May, 2011

1 commit

70087dc38 blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup ... Browse Code »

Currentlly we first map the task to cgroup and then cgroup to
blkio_cgroup. There is a more direct way to get to blkio_cgroup
from task using task_subsys_state(). Use that.

The real reason for the fix is that it also avoids a race in generic
cgroup code. During remount/umount rebind_subsystems() is called and
it can do following with and rcu protection.

cgrp->subsys[i] = NULL;

That means if somebody got hold of cgroup under rcu and then it tried
to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
is wrong. I was running into this race condition with ltp running on a
upstream derived kernel and that lead to crash.

So ideally we should also fix cgroup generic code to wait for rcu
grace period before setting pointer to NULL. Li Zefan is not very keen
on introducing synchronize_wait() as he thinks it will slow
down moun/remount/umount operations.

So for the time being atleast fix the kernel crash by taking a more
direct route to blkio_cgroup.

One tester had reported a crash while running LTP on a derived kernel
and with this fix crash is no more seen while the test has been
running for over 6 days.

Signed-off-by: Vivek Goyal
Reviewed-by: Li Zefan
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-16 21:24:08 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Mar, 2011

1 commit

9026e521c blk-cgroup: Only give unaccounted_time under debug ... Browse Code »

This change moves unaccounted_time to only be reported when
CONFIG_DEBUG_BLK_CGROUP is true.

Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-23 04:26:54 +0800

12 Mar, 2011

1 commit

167400d34 blk-cgroup: Add unaccounted time to timeslice_used. ... Browse Code »

There are two kind of times that tasks are not charged for: the first
seek and the extra time slice used over the allocated timeslice. Both
of these exported as a new unaccounted_time stat.

I think it would be good to have this reported in 'time' as well, but
that is probably a separate discussion.

Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-12 23:54:00 +0800

16 Nov, 2010

1 commit

bdc85df7a blk-cgroup: Allow creation of hierarchical cgroups ... Browse Code »

o Allow hierarchical cgroup creation for blkio controller

o Currently we disallow it as both the io controller policies (throttling
as well as proportion bandwidth) do not support hierarhical accounting
and control. But the flip side is that blkio controller can not be used with
libvirt as libvirt creates a cgroup hierarchy deeper than 1 level.

//libvirt/qemu/

o So this patch will allow creation of cgroup hierarhcy but at the backend
everything will be treated as flat. So if somebody created a an hierarchy
like as follows.

root
/ \
test1 test2
|
test3

CFQ and throttling will practically treat all groups at same level.

pivot
/ | \ \
root test1 test2 test3

o Once we have actual support for hierarchical accounting and control
then we can introduce another cgroup tunable file "blkio.use_hierarchy"
which will be 0 by default but if user wants to enforce hierarhical
control then it can be set to 1. This way there should not be any
ABI problems down the line.

o The only not so pretty part is introduction of extra file "use_hierarchy"
down the line. Kame-san had mentioned that hierarhical accounting is
expensive in memory controller hence they keep it off by default. I
suspect same will be the case for IO controller also as for each IO
completion we shall have to account IO through hierarchy up to the root.
if yes, then it probably is not a very bad idea to introduce this extra
file so that it will be used only when somebody needs it and some people
might enable hierarchy only in part of the hierarchy.

o This is how basically memory controller also uses "use_hierarhcy" and
they also allowed creation of hierarchies when actual backend support
was not available.

Signed-off-by: Vivek Goyal
Acked-by: Balbir Singh
Reviewed-by: Gui Jianfeng
Reviewed-by: Ciju Rajan K
Tested-by: Ciju Rajan K
Signed-off-by: Jens Axboe

Vivek Goyal
2010-11-16 02:37:36 +0800

23 Oct, 2010

1 commit

e9dd2b683 Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
cfq-iosched: Fix a gcc 4.5 warning and put some comments
block: Turn bvec_k{un,}map_irq() into static inline functions
block: fix accounting bug on cross partition merges
block: Make the integrity mapped property a bio flag
block: Fix double free in blk_integrity_unregister
block: Ensure physical block size is unsigned int
blkio-throttle: Fix possible multiplication overflow in iops calculations
blkio-throttle: limit max iops value to UINT_MAX
blkio-throttle: There is no need to convert jiffies to milli seconds
blkio-throttle: Fix link failure failure on i386
blkio: Recalculate the throttled bio dispatch time upon throttle limit change
blkio: Add root group to td->tg_list
blkio: deletion of a cgroup was causes oops
blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: revert bad fix for memory hotplug causing bounces
Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: Prevent hang_check firing during long I/O
cfq: improve fsync performance for small files
...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

Linus Torvalds
2010-10-23 08:00:32 +0800

02 Oct, 2010

1 commit

9355aede5 blkio-throttle: limit max iops value to UINT_MAX ... Browse Code »

- Limit max iops value to UINT_MAX and return error to user if value is more
than that instead of accepting bigger values and truncating implicitly.

Signed-off-by: Jens Axboe

Vivek Goyal
2010-10-02 03:16:41 +0800

01 Oct, 2010

3 commits

fe0714377 blkio: Recalculate the throttled bio dispatch time upon throttle limit change ... Browse Code »

o Currently any cgroup throttle limit changes are processed asynchronousy and
the change does not take affect till a new bio is dispatched from same group.

o It might happen that a user sets a redicuously low limit on throttling.
Say 1 bytes per second on reads. In such cases simple operations like mount
a disk can wait for a very long time.

o Once bio is throttled, there is no easy way to come out of that wait even if
user increases the read limit later.

o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
the bio dispatch time according to new limits.

o Can't take queueu lock under blkcg_lock, hence after the change I wake
up the dispatch thread again which recalculates the time. So there are some
variables being synchronized across two threads without lock and I had to
make use of barriers. Hoping I have used barriers correctly. Any review of
memory barrier code especially will help.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-10-01 20:49:49 +0800
61014e96e blkio: deletion of a cgroup was causes oops ... Browse Code »

o Now a cgroup list of blkg elements can contain blkg from multiple policies.
Before sending an unlink event, make sure blkg belongs to they policy. If
policy does not own the blkg, do not send update for this blkg.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-10-01 20:49:44 +0800
13f98250f blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n ... Browse Code »

Currently throttling related files were visible even if user had disabled
throttling using config options. It was switching off background throttling
of bio but not the cgroup files. This patch fixes it.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-10-01 20:49:41 +0800

16 Sep, 2010

4 commits

7702e8f45 blk-cgroup: cgroup changes for IOPS limit support ... Browse Code »

o cgroup changes for IOPS throttling rules.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-09-16 14:42:58 +0800
4c9eefa16 blk-cgroup: Introduce cgroup changes for throttling policy ... Browse Code »

o cgroup chagnes for throttle policy.

o Introduces READ and WRITE bytes per second throttling rules.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-09-16 14:42:12 +0800
062a644d6 blk-cgroup: Prepare the base for supporting more than one IO control policies ... Browse Code »

o This patch prepares the base for introducing new IO control policies.
Currently all the code is written knowing there is only one policy
and that is proportional bandwidth. Creating infrastructure for newer
policies to come in.

o Also there were many functions which were generated using macro. It was
very confusing. Got rid of those.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-09-16 14:42:04 +0800
af41d7bd9 blk-cgroup: Kill the header printed at the start of blkio.weight_device file ... Browse Code »

o Kill extra "dev weight" header which is printed when somebody reads
blkio.weight_device file. This really seems to be out of convention. No other
blkio files are printing any header at the start of file. I think it is ok
to just print values and how to interpret values should be part of
documentation.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2010-09-16 14:40:42 +0800

23 Aug, 2010

1 commit

96aa1b419 blkio: Fix return code for mkdir calls ... Browse Code »

If the cgroup hierarchy for blkio control groups is deeper than two
levels, kernel should not allow the creation of further levels. mkdir
system call does not except EINVAL as a return value. This patch
replaces EINVAL with more appropriate EPERM

Signed-off-by: Ciju Rajan K
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Jens Axboe

Ciju Rajan K
2010-08-23 16:56:30 +0800

22 May, 2010

1 commit

ee9a3607f Merge branch 'master' into for-2.6.35 ... Browse Code »

Conflicts:
fs/ext3/fsync.c

Signed-off-by: Jens Axboe

Jens Axboe
2010-05-22 03:27:26 +0800

07 May, 2010

1 commit

0341509fd blk-cgroup: Fix an RCU warning in blkiocg_create() ... Browse Code »

with CONFIG_PROVE_RCU=y, a warning can be triggered:

# mount -t cgroup -o blkio xxx /mnt
# mkdir /mnt/subgroup

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

To fix this, we avoid caling css_depth() here, which is a bit simpler
than the original code.

Signed-off-by: Li Zefan
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Li Zefan
2010-05-07 14:57:00 +0800

03 May, 2010

1 commit

0f3942a39 block: kill some useless goto's in blk-cgroup.c ... Browse Code »

goto has its place, but lets cut back on some of the more
frivolous uses of it.

Signed-off-by: Jens Axboe

Jens Axboe
2010-05-03 20:28:55 +0800

27 Apr, 2010

2 commits

afc24d49c blk-cgroup: config options re-arrangement ... Browse Code »

This patch fixes few usability and configurability issues.

o All the cgroup based controller options are configurable from
"Genral Setup/Control Group Support/" menu. blkio is the only exception.
Hence make this option visible in above menu and make it configurable from
there to bring it inline with rest of the cgroup based controllers.

o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.

This option currently does two things.

- Enable printing of cgroup paths in blktrace
- Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
files in cgroup.

If we are using group scheduling, blktrace data is of not really much use
if cgroup information is not present. To get this data, currently one has to
also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
all the additional debug stat files which is not desired.

Hence, this patch moves printing of cgroup paths under
CONFIG_CFQ_GROUP_IOSCHED.

This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
can be enabled through config menu.

Signed-off-by: Vivek Goyal
Acked-by: Divyesh Shah
Reviewed-by: Gui Jianfeng
Signed-off-by: Jens Axboe

Vivek Goyal
2010-04-27 01:27:56 +0800
e5ff082e8 blkio: Fix another BUG_ON() crash due to cfqq movement across groups ... Browse Code »

o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
assuming that upon slice expiry, group can't be marked empty already (except
forced dispatch).

But this assumption is broken if cfqq can move (group_isolation=0) across
groups after receiving a request.

I think most likely in this case we got a request in a cfqq and accounted
the rq in one group, later while adding the cfqq to tree, we moved the queue
to a different group which was already marked empty and after dispatch from
slice we found group already marked empty and raised alarm.

This patch does not error out if group is already marked empty. This can
introduce some empty_time stat error only in case of group_isolation=0. This
is better than crashing. In case of group_isolation=1 we should still get
same stats as before this patch.

[ 222.308546] ------------[ cut here ]------------
[ 222.309311] kernel BUG at block/blk-cgroup.c:236!
[ 222.309311] invalid opcode: 0000 [#1] SMP
[ 222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
[ 222.309311] CPU 1
[ 222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 222.309311]
[ 222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
[ 222.309311] RIP: 0010:[] [] blkiocg_set_start_empty_time+0x50/0x83
[ 222.309311] RSP: 0018:ffff8800ba6e79f8 EFLAGS: 00010002
[ 222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
[ 222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
[ 222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
[ 222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
[ 222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
[ 222.309311] FS: 00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[ 222.309311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
[ 222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
[ 222.309311] Stack:
[ 222.309311] 0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
[ 222.309311] ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
[ 222.309311] ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
[ 222.309311] Call Trace:
[ 222.309311] [] __cfq_slice_expired+0x2af/0x3ec
[ 222.309311] [] cfq_dispatch_requests+0x2c8/0x8e8
[ 222.309311] [] ? spin_unlock_irqrestore+0xe/0x10
[ 222.309311] [] ? blk_insert_cloned_request+0x70/0x7b
[ 222.309311] [] blk_peek_request+0x191/0x1a7
[ 222.309311] [] dm_request_fn+0x38/0x14c [dm_mod]
[ 222.309311] [] ? sync_page_killable+0x0/0x35
[ 222.309311] [] __generic_unplug_device+0x32/0x37
[ 222.309311] [] generic_unplug_device+0x2e/0x3c
[ 222.309311] [] dm_unplug_all+0x42/0x5b [dm_mod]
[ 222.309311] [] blk_unplug+0x29/0x2d
[ 222.309311] [] blk_backing_dev_unplug+0x12/0x14
[ 222.309311] [] block_sync_page+0x35/0x39
[ 222.309311] [] sync_page+0x41/0x4a
[ 222.309311] [] sync_page_killable+0xe/0x35
[ 222.309311] [] __wait_on_bit_lock+0x46/0x8f
[ 222.309311] [] __lock_page_killable+0x66/0x6d
[ 222.309311] [] ? wake_bit_function+0x0/0x33
[ 222.309311] [] lock_page_killable+0x2c/0x2e
[ 222.309311] [] generic_file_aio_read+0x361/0x4f0
[ 222.309311] [] do_sync_read+0xcb/0x108
[ 222.309311] [] ? security_file_permission+0x16/0x18
[ 222.309311] [] vfs_read+0xab/0x108
[ 222.309311] [] sys_read+0x4a/0x6e
[ 222.309311] [] system_call_fastpath+0x16/0x1b
[ 222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
[ 222.309311] RIP [] blkiocg_set_start_empty_time+0x50/0x83
[ 222.309311] RSP
[ 222.309311] ---[ end trace 32b4f71dffc15712 ]---

Signed-off-by: Vivek Goyal
Acked-by: Divyesh Shah
Signed-off-by: Jens Axboe

Vivek Goyal
2010-04-27 01:25:11 +0800

16 Apr, 2010

1 commit

8d2a91f89 blkio: Initialize blkg->stats_lock for the root cfqg too ... Browse Code »

This fixes the lockdep warning reported by Gui Jianfeng.

Signed-off-by: Divyesh Shah
Reviewed-by: Gui Jianfeng
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-16 14:10:51 +0800

14 Apr, 2010

3 commits

28baf4429 blkio: Fix compile errors ... Browse Code »

Fixes compile errors in blk-cgroup code for empty_time stat and a merge fix in
CFQ. The first error was when CONFIG_DEBUG_CFQ_IOSCHED is not set.

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-14 17:22:38 +0800
4facdaec1 Merge branch 'master' into for-2.6.35 ... Browse Code »

Conflicts:
block/blk-cgroup.c
block/cfq-iosched.c

Signed-off-by: Jens Axboe

Jens Axboe
2010-04-14 02:03:21 +0800
a11cdaa7a block: Update to io-controller stats ... Browse Code »

Changelog from v1:
o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
dispatch time.

Changelog from original patchset: (in response to Vivek Goyal's comments)
o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
o rename blkiocg_update_set_active_queue_stats() to
blkiocg_update_avg_queue_size_stats()
o s/request/io/ in blkiocg_update_request_add_stats() and
blkiocg_update_request_remove_stats()
o Call cfq_del_timer() at request dispatch() instead of
blkiocg_update_idle_time_stats()

Signed-off-by: Divyesh Shah
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-14 01:59:17 +0800

13 Apr, 2010

1 commit

34d0f179d io-controller: Add a new interface "weight_device" for IO-Controller ... Browse Code »

Currently, IO Controller makes use of blkio.weight to assign weight for
all devices. Here a new user interface "blkio.weight_device" is introduced to
assign different weights for different devices. blkio.weight becomes the
default value for devices which are not configured by "blkio.weight_device"

You can use the following format to assigned specific weight for a given
device:
#echo "major:minor weight" > blkio.weight_device

major:minor represents device number.

And you can remove weight for a given device as following:
#echo "major:minor 0" > blkio.weight_device

V1->V2 changes:
- use user interface "weight_device" instead of "policy" suggested by Vivek
- rename some struct suggested by Vivek
- rebase to 2.6-block "for-linus" branch
- remove an useless list_empty check pointed out by Li Zefan
- some trivial typo fix

V2->V3 changes:
- Move policy_*_node() functions up to get rid of forward declarations
- rename related functions by adding prefix "blkio_"

Signed-off-by: Gui Jianfeng
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Gui Jianfeng
2010-04-13 16:14:20 +0800

09 Apr, 2010

4 commits

812df48d1 blkio: Add more debug-only per-cgroup stats ... Browse Code »

1) group_wait_time - This is the amount of time the cgroup had to wait to get a
timeslice for one of its queues from when it became busy, i.e., went from 0
to 1 request queued. This is different from the io_wait_time which is the
cumulative total of the amount of time spent by each IO in that cgroup waiting
in the scheduler queue. This stat is a great way to find out any jobs in the
fleet that are being starved or waiting for longer than what is expected (due
to an IO controller bug or any other issue).
2) empty_time - This is the amount of time a cgroup spends w/o any pending
requests. This stat is useful when a job does not seem to be able to use its
assigned disk share by helping check if that is happening due to an IO
controller bug or because the job is not submitting enough IOs.
3) idle_time - This is the amount of time spent by the IO scheduler idling
for a given cgroup in anticipation of a better request than the exising ones
from other queues/cgroups.

All these stats are recorded using start and stop events. When reading these
stats, we do not add the delta between the current time and the last start time
if we're between the start and stop events. We avoid doing this to make sure
that these numbers are always monotonically increasing when read. Since we're
using sched_clock() which may use the tsc as its source, it may induce some
inconsistency (due to tsc resync across cpus) if we included the current delta.

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-09 14:36:08 +0800
cdc1184cf blkio: Add io_queued and avg_queue_size stats ... Browse Code »

These stats are useful for getting a feel for the queue depth of the cgroup,
i.e., how filled up its queues are at a given instant and over the existence of
the cgroup. This ability is useful when debugging problems in the wild as it
helps understand the application's IO pattern w/o having to read through the
userspace code (coz its tedious or just not available) or w/o the ability
to run blktrace (since you may not have root access and/or not want to disturb
performance).

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-09 14:36:08 +0800
812d40264 blkio: Add io_merged stat ... Browse Code »

This includes both the number of bios merged into requests belonging to this
cgroup as well as the number of requests merged together.
In the past, we've observed different merging behavior across upstream kernels,
some by design some actual bugs. This stat helps a lot in debugging such
problems when applications report decreased throughput with a new kernel
version.

This needed adding an extra elevator function to capture bios being merged as I
did not want to pollute elevator code with blkiocg knowledge and hence needed
the accounting invocation to come from CFQ.

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-09 14:36:07 +0800
84c124da9 blkio: Changes to IO controller additional stats patches ... Browse Code »

that include some minor fixes and addresses all comments.

Changelog: (most based on Vivek Goyal's comments)
o renamed blkiocg_reset_write to blkiocg_reset_stats
o more clarification in the documentation on io_service_time and io_wait_time
o Initialize blkg->stats_lock
o rename io_add_stat to blkio_add_stat and declare it static
o use bool for direction and sync
o derive direction and sync info from existing rq methods
o use 12 for major:minor string length
o define io_service_time better to cover the NCQ case
o add a separate reset_stats interface
o make the indexed stats a 2d array to simplify macro and function pointer code
o blkio.time now exports in jiffies as before
o Added stats description in patch description and
Documentation/cgroup/blkio-controller.txt
o Prefix all stats functions with blkio and make them static as applicable
o replace IO_TYPE_MAX with IO_TYPE_TOTAL
o Moved #define constant to top of blk-cgroup.c
o Pass dev_t around instead of char *
o Add note to documentation file about resetting stats
o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
statements
o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
rq_direction() and rq_sync() functions which are used by CFQ and when using
io-controller at a higher level, bio_* functions can be added.

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2010-04-09 14:31:19 +0800