Eric Lee / smarc-fsl-linux-kernel

07 Jan, 2012

2 commits

cec3c159f cfq-iosched: fix cfq_cic_link() race confition ... Browse Code »

commit 5eb46851de3904cd1be9192fdacb8d34deadc1fc upstream.

cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:

step 1: Process A: Issue an I/O to /dev/sda
step 2: Process A: Get an ioc (iocA here) in get_io_context() which does not
linked with a cic for the device
step 3: Process A: Get a new cic for the device (cicA here) in
cfq_alloc_io_context()

step 4: Process B: Issue an I/O to /dev/sda
step 5: Process B: Get iocA in get_io_context() since process A and B share the
same ioc
step 6: Process B: Get a new cic for the device (cicB here) in
cfq_alloc_io_context() since iocA has not been linked with a
cic for the device yet

step 7: Process A: Link cicA to iocA in cfq_cic_link()
step 8: Process A: Dispatch I/O to driver and finish it

step 9: Process B: Try to link cicB to iocA in cfq_cic_link()
But it fails with showing "cfq: cic link failed!" kernel
message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
The function does not wake up, when there is no I/O to the
device.

When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Yasuaki Ishimatsu
2012-01-07 06:13:46 +0800
8f8a59425 cfq-iosched: free cic_index if blkio_alloc_blkg_stats fails ... Browse Code »

commit 2984ff38ccf6cbc02a7a996a36c7d6f69f3c6146 upstream.

If we fail allocating the blkpg stats, we free cfqd and cfgq.
But we need to free the IDA cfqd->cic_index as well.

Signed-off-by: majianpeng
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

majianpeng
2012-01-07 06:13:46 +0800

27 Jun, 2011

2 commits

726e99ab8 cfq-iosched: make code consistent ... Browse Code »

ioc->ioc_data is rcu protectd, so uses correct API to access it.
This doesn't change any behavior, but just make code consistent.

Signed-off-by: Shaohua Li
Cc: stable@kernel.org # after ab4bd22d
Signed-off-by: Jens Axboe

Shaohua Li
2011-06-27 15:36:06 +0800
3181faa85 cfq-iosched: fix a rcu warning ... Browse Code »

I got a rcu warnning at boot. the ioc->ioc_data is rcu_deferenced, but
doesn't hold rcu_read_lock.

Signed-off-by: Shaohua Li
Cc: stable@kernel.org # after ab4bd22d
Signed-off-by: Jens Axboe

Shaohua Li
2011-06-27 15:36:06 +0800

13 Jun, 2011

1 commit

08e8138ad block: Add __attribute__((format(printf...) and fix fallout ... Browse Code »

Use the compiler to verify format strings and arguments.

Fix fallout.

Signed-off-by: Joe Perches
Signed-off-by: Jens Axboe

Joe Perches
2011-06-13 16:42:49 +0800

06 Jun, 2011

1 commit

ab4bd22d3 cfq-iosched: fix locking around ioc->ioc_data assignment ... Browse Code »

Since we are modifying this RCU pointer, we need to hold
the lock protecting it around it.

This fixes a potential reuse and double free of a cfq
io_context structure. The bug has been in CFQ for a long
time, it hit very few people but those it did hit seemed
to see it a lot.

Tracked in RH bugzilla here:

https://bugzilla.redhat.com/show_bug.cgi?id=577968

Credit goes to Paul Bolle for figuring out that the issue
was around the one-hit ioc->ioc_data cache. Thanks to his
hard work the issue is now fixed.

Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Jens Axboe
2011-06-06 11:56:49 +0800

02 Jun, 2011

1 commit

28304f485 cfq-iosched: Remove bogus check in queue_fail path ... Browse Code »

queue_fail can only be reached if cic is NULL, so its check for cic must
be bogus.

Signed-off-by: Paul Bolle
Signed-off-by: Jens Axboe

Paul Bolle
2011-06-02 19:05:02 +0800

01 Jun, 2011

1 commit

4495a7d41 CFQ: Fix typo and remove unnecessary semicolon ... Browse Code »

Fix comment typo and remove unnecessary semicolon at macro

Signed-off-by: Kyungmin Park
Signed-off-by: Jens Axboe

Kyungmin Park
2011-06-01 01:49:44 +0800

24 May, 2011

4 commits

1547010e6 cfq-iosched: free cic_index if cfqd allocation fails ... Browse Code »

When struct cfq_data allocation fails, cic_index need to be freed.

Signed-off-by: Namhyung Kim
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-24 16:23:22 +0800
20359f27e cfq-iosched: remove unused 'group_changed' in cfq_service_tree_add() ... Browse Code »

The 'group_changed' variable is initialized to 0 and never changed, so
checking the variable is meaningless.

It is a leftover from 0bbfeb832042 ("cfq-iosched: Always provide group
iosolation."). Let's get rid of it.

Signed-off-by: Namhyung Kim
Cc: Justin TerAvest
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-24 16:23:22 +0800
229836bd6 cfq-iosched: reduce bit operations in cfq_choose_req() ... Browse Code »

Reduce the number of bit operations in cfq_choose_req() on average
(and worst) cases.

Signed-off-by: Namhyung Kim
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-24 16:23:21 +0800
b9f8ce059 cfq-iosched: algebraic simplification in cfq_prio_to_maxrq() ... Browse Code »

Simplify the calculation in cfq_prio_to_maxrq(), plus replace CFQ_PRIO_LISTS to
IOPRIO_BE_NR since they are the same and IOPRIO_BE_NR looks more reasonable in
this context IMHO.

Signed-off-by: Namhyung Kim
Signed-off-by: Jens Axboe

Namhyung Kim
2011-05-24 16:23:21 +0800

23 May, 2011

1 commit

2abae55f5 cfq-iosched: Fix a memory leak of per cpu stats for root group ... Browse Code »

We allocated per cpu stats struct for root group but did not free it.
Fix it.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-23 16:02:19 +0800

21 May, 2011

4 commits

5624a4e44 blk-throttle: Make dispatch stats per cpu ... Browse Code »

Currently we take blkg_stat lock for even updating the stats. So even if
a group has no throttling rules (common case for root group), we end
up taking blkg_lock, for updating the stats.

Make dispatch stats per cpu so that these can be updated without taking
blkg lock.

If cpu goes offline, these stats simply disappear. No protection has
been provided for that yet. Do we really need anything for that?

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:52 +0800
f469a7b4d blk-cgroup: Allow sleeping while dynamically allocating a group ... Browse Code »

Currently, all the cfq_group or throtl_group allocations happen while
we are holding ->queue_lock and sleeping is not allowed.

Soon, we will move to per cpu stats and also need to allocate the
per group stats. As one can not call alloc_percpu() from atomic
context as it can sleep, we need to drop ->queue_lock, allocate the
group, retake the lock and continue processing.

In throttling code, I check the queue DEAD flag again to make sure
that driver did not call blk_cleanup_queue() in the mean time.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:52 +0800
56edf7d75 cfq-iosched: Fix a possible race with cfq cgroup removal code ... Browse Code »

blkg->key = cfqd is an rcu protected pointer and hence we used to do
call_rcu(cfqd->rcu_head) to free up cfqd after one rcu grace period.

The problem here is that even though cfqd is around, there are no
gurantees that associated request queue (td->queue) or q->queue_lock
is still around. A driver might have called blk_cleanup_queue() and
release the lock.

It might happen that after freeing up the lock we call
blkg->key->queue->queue_ock and crash. This is possible in following
path.

blkiocg_destroy()
blkio_unlink_group_fn()
cfq_unlink_blkio_group()

Hence, wait for an rcu peirod if there are groups which have not
been unlinked from blkcg->blkg_list. That way, if there are any groups
which are taking cfq_unlink_blkio_group() path, can safely take queue
lock.

This is how we have taken care of race in throttling logic also.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:52 +0800
3e59cf9d6 cfq-iosched: Get rid of redundant function parameter "create" ... Browse Code »

Nobody seems to be using cfq_find_alloc_cfqg() function parameter "create".
Get rid of that.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-21 02:34:52 +0800

16 May, 2011

1 commit

70087dc38 blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup ... Browse Code »

Currentlly we first map the task to cgroup and then cgroup to
blkio_cgroup. There is a more direct way to get to blkio_cgroup
from task using task_subsys_state(). Use that.

The real reason for the fix is that it also avoids a race in generic
cgroup code. During remount/umount rebind_subsystems() is called and
it can do following with and rcu protection.

cgrp->subsys[i] = NULL;

That means if somebody got hold of cgroup under rcu and then it tried
to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
is wrong. I was running into this race condition with ltp running on a
upstream derived kernel and that lead to crash.

So ideally we should also fix cgroup generic code to wait for rcu
grace period before setting pointer to NULL. Li Zefan is not very keen
on introducing synchronize_wait() as he thinks it will slow
down moun/remount/umount operations.

So for the time being atleast fix the kernel crash by taking a more
direct route to blkio_cgroup.

One tester had reported a crash while running LTP on a derived kernel
and with this fix crash is no more seen while the test has been
running for over 6 days.

Signed-off-by: Vivek Goyal
Reviewed-by: Li Zefan
Signed-off-by: Jens Axboe

Vivek Goyal
2011-05-16 21:24:08 +0800

19 Apr, 2011

1 commit

5f45c6958 cfq-iosched: read_lock() does not always imply rcu_read_lock() ... Browse Code »

For some configurations of CONFIG_PREEMPT that is not true. So
get rid of __call_for_each_cic() and always uses the explicitly
rcu_read_lock() protected call_for_each_cic() instead.

This fixes a potential bug related to IO scheduler removal or
online switching.

Thanks to Paul McKenney for clarifying this.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-19 15:10:35 +0800

18 Apr, 2011

1 commit

24ecfbe27 block: add blk_run_queue_async ... Browse Code »

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-04-18 17:41:33 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Mar, 2011

3 commits

c4ade94fc cfq-iosched: removing unnecessary think time checking ... Browse Code »

Removing think time checking. A high thinktime queue might means the queue
dispatches several requests and then do away. Limitting such queue seems
meaningless. And also this can simplify code. This is suggested by Vivek.

Signed-off-by: Shaohua Li
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Li, Shaohua
2011-03-23 15:30:34 +0800
62a37f6ba cfq-iosched: Don't clear queue stats when preempt. ... Browse Code »

For v2, I added back lines to cfq_preempt_queue() that were removed
during updates for accounting unaccounted_time. Thanks for pointing out
that I'd missed these, Vivek.

Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly
cleared stats for preempting queues when it shouldn't have, because when
we choose a queue to preempt, it still isn't necessarily scheduled next.

Thanks to Vivek Goyal for figuring this out and understanding how the
preemption code works.

Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-23 15:25:44 +0800
eda5e0c91 cfq-iosched: Don't set active queue in preempt ... Browse Code »

Commit "Add unaccounted time to timeslice_used" changed the behavior of
cfq_preempt_queue to set cfqq active. Vivek pointed out that other
preemption rules might get involved, so we shouldn't manually set which
queue is active.

This cleans up the code to just clear the queue stats at preemption
time.

Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-23 04:26:49 +0800

17 Mar, 2011

1 commit

8184f93ec cfq-iosched: Don't update group weights when on service tree ... Browse Code »

Version 3 is updated to apply to for-2.6.39/core.

For version 2, I took Vivek's advice and made sure we update the group
weight from cfq_group_service_tree_add().

If a weight was updated while a group is on the service tree, the
calculation for the total weight of the service tree can be adjusted
improperly, which either leads to bad service tree weights, or
potentially crashes (if total_weight becomes 0).

This patch defers updates to the weight until a group is off the service
tree.

Signed-off-by: Justin TerAvest
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-17 23:12:36 +0800

12 Mar, 2011

1 commit

167400d34 blk-cgroup: Add unaccounted time to timeslice_used. ... Browse Code »

There are two kind of times that tasks are not charged for: the first
seek and the extra time slice used over the allocated timeslice. Both
of these exported as a new unaccounted_time stat.

I think it would be good to have this reported in 'time' as well, but
that is probably a separate discussion.

Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-12 23:54:00 +0800

10 Mar, 2011

2 commits

4c63f5646 Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core ... Browse Code »

Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:58:35 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

07 Mar, 2011

4 commits

b873c5d69 Merge branch 'block-for-2.6.39-core' of ssh://master.kernel.org/pub/scm/linux/ke… ... Browse Code »

…rnel/git/tj/misc into for-2.6.39/core

Jens Axboe
2011-03-07 16:40:21 +0800
a60327107 cfq-iosched: Fix update_vdisktime logic ... Browse Code »

The update_vdisktime logic is broken since commit
b54ce60eb7f61f8e314b8b241b0469eda3bb1d42, st->min_vdisktime never makes
a progress. Fix it.

Thanks Vivek for pointing it out.

Signed-off-by: Gui Jianfeng
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Gui Jianfeng
2011-03-07 16:28:09 +0800
ef8a41df8 cfq-iosched: give busy sync queue no dispatch limit ... Browse Code »

If there are a sync and an async queue and the sync queue's think time
is small, we can ignore the sync queue's dispatch quantum. Because the
sync queue will always preempt the async queue, we don't need to care
about async's latency. This can fix a performance regression of
aiostress test, which is introduced by commit f8ae6e3eb825. The issue
should exist even without the commit, but the commit amplifies the
impact.

The initial post does the same optimization for RT queue too, but since
I have no real workload for it, Vivek suggests to drop it.

Signed-off-by: Shaohua Li
Reviewed-by: Gui Jianfeng
Signed-off-by: Jens Axboe

Shaohua Li
2011-03-07 16:26:29 +0800
93803e014 cfq-iosched: fix race in cfq_set_request() ... Browse Code »

We need to hold the queue lock over the reference increment,
it's not atomic anymore.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-07 15:59:06 +0800

05 Mar, 2011

1 commit

e83a46bbb Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/core ... Browse Code »

This merge creates two set of conflicts. One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.

The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q->request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
isn't trivial but the resolution is straight-forward.

* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
should be called with @force_kblockd set to %true.

* elv_insert() in blk_kick_flush() should use
%ELEVATOR_INSERT_REQUEUE.

Both changes are to avoid invoking ->request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.

Signed-off-by: Tejun Heo

Tejun Heo
2011-03-05 02:09:02 +0800

02 Mar, 2011

3 commits

1654e7411 block: add @force_kblockd to __blk_run_queue() ... Browse Code »

__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd. Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
blk-flush implementation.

Signed-off-by: Tejun Heo
Cc: Jan Beulich
Cc: James Bottomley
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2011-03-02 21:48:05 +0800
0bbfeb832 cfq-iosched: Always provide group isolation. ... Browse Code »

Effectively, make group_isolation=1 the default and remove the tunable.
The setting group_isolation=0 was because by default we idle on
sync-noidle tree and on fast devices, this can be very harmful for
throughput.

However, this problem can also be addressed by tuning slice_idle and
possibly group_idle on faster storage devices.

This change simplifies the CFQ code by removing the feature entirely.

Signed-off-by: Justin TerAvest
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Justin TerAvest
2011-03-02 04:05:08 +0800
6fae9c251 Merge commit 'v2.6.38-rc6' into for-2.6.39/core ... Browse Code »

Conflicts:
block/cfq-iosched.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-02 04:04:39 +0800

11 Feb, 2011

1 commit

c186794db block: share request flush fields with elevator_private ... Browse Code »

Flush requests are never put on the IO scheduler. Convert request
structure's elevator_private* into an array and have the flush fields
share a union with it.

Reclaim the space lost in 'struct request' by moving 'completion_data'
back in the union with 'rb_node'.

Signed-off-by: Mike Snitzer
Acked-by: Vivek Goyal
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Mike Snitzer
2011-02-11 18:08:00 +0800

09 Feb, 2011

1 commit

02a8f01b5 cfq-iosched: Don't wait if queue already has requests. ... Browse Code »

Commit 7667aa0630407bc07dc38dcc79d29cc0a65553c1 added logic to wait for
the last queue of the group to become busy (have at least one request),
so that the group does not lose out for not being continuously
backlogged. The commit did not check for the condition that the last
queue already has some requests. As a result, if the queue already has
requests, wait_busy is set. Later on, cfq_select_queue() checks the
flag, and decides that since the queue has a request now and wait_busy
is set, the queue is expired. This results in early expiration of the
queue.

This patch fixes the problem by adding a check to see if queue already
has requests. If it does, wait_busy is not set. As a result, time slices
do not expire early.

The queues with more than one request are usually buffered writers.
Testing shows improvement in isolation between buffered writers.

Cc: stable@kernel.org
Signed-off-by: Justin TerAvest
Reviewed-by: Gui Jianfeng
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe

Justin TerAvest
2011-02-09 21:22:36 +0800

19 Jan, 2011

1 commit

ba5bd520f cfq: rename a function to give it more appropriate name ... Browse Code »

o Rename a function to give it more approprate name. We are calculating
cfq queue slice and function name gives the impression as if cfq group
slice length is being calculated.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2011-01-19 23:25:02 +0800

14 Jan, 2011

1 commit

c553f8e33 block cfq: compensate preempted queue even if it has no slice assigned ... Browse Code »

If a queue is preempted before it gets slice assigned, the queue doesn't get
compensation, which looks unfair. For such queue, we compensate it for a whole
slice.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-01-14 15:41:03 +0800