Eric Lee / smarc-fsl-linux-kernel

24 Apr, 2009

3 commits

f2d1f0ae7 cfq-iosched: cache prio_tree root in cfqq->p_root ... Browse Code »

Currently we look it up from ->ioprio, but ->ioprio can change if
either the process gets its IO priority changed explicitly, or if
cfq decides to temporarily boost it. So if we are unlucky, we can
end up attempting to remove a node from a different rbtree root than
where it was added.

Fix this by using ->org_ioprio as the prio_tree index, since that
will only change for explicit IO priority settings (not for a boost).
Additionally cache the rbtree root inside the cfqq, then we don't have
to add code to reinsert the cfqq in the prio_tree if IO priority changes.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-24 14:54:22 +0800
3ac6c9f8a cfq-iosched: fix bug with aliased request and cooperation detection ... Browse Code »

cfq_prio_tree_lookup() should return the direct match, yet it always
returns zero. Fix that.

cfq_prio_tree_add() assumes that we don't get a direct match, while
it is very possible that we do. Using O_DIRECT, you can have different
cfqq with matching requests, since you don't have the page cache
to serialize things for you. Fix this bug by only adding the cfqq if
there isn't an existing match.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-24 14:54:22 +0800
26a2ac009 cfq-iosched: clear ->prio_trees[] on cfqd alloc ... Browse Code »

Not strictly needed, but we should make it clear that we init the
rbtree roots here.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-24 14:54:22 +0800

22 Apr, 2009

2 commits

04dc6e71a cfq-iosched: use the default seek distance when there aren't enough seek samples ... Browse Code »

If the cfq io context doesn't have enough samples yet to provide a mean
seek distance, then use the default threshold we have for seeky IO instead
of defaulting to 0.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2009-04-22 14:35:11 +0800
4d00aa47e cfq-iosched: make seek_mean converge more quickly ... Browse Code »

Right now, depending on the first sector to which a process issues I/O,
the seek time may start out way out of whack. So make sure we start
with 0 sectors in seek, instead of the offset of the first request
issued.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2009-04-22 14:35:11 +0800

15 Apr, 2009

7 commits

a36e71f99 cfq-iosched: add close cooperator code ... Browse Code »

If we have processes that are working in close proximity to each
other on disk, we don't want to idle wait. Instead allow the close
process to issue a request, getting better aggregate bandwidth.
The anticipatory scheduler has similar checks, noop and deadline do
not need it since they don't care about process io mappings.

The code for CFQ is a little more involved though, since we split
request queues into per-process contexts.

This fixes a performance problem with eg dump(8), since it uses
several processes in some silly attempt to speed IO up. Even if
dump(8) isn't really a valid case (it should be fixed by using
CLONE_IO), there are other cases where we see close processes
and where idling ends up hurting performance.

Credit goes to Jeff Moyer for writing the
initial implementation.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:15:11 +0800
9481ffdc6 cfq-iosched: log responsible 'cfqq' in idle timer arm ... Browse Code »

Makes it easier to read the traces.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:14:13 +0800
2d8707229 cfq-iosched: tweak kick logic a bit more ... Browse Code »

We only kick the dispatch for an idling queue, if we think it's a
(somewhat) fully merged request. Also allow a kick if we have other
busy queues in the system, since we don't want to risk waiting for
a potential merge in that case. It's better to get some work done and
proceed.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:12:46 +0800
40bb54d19 cfq-iosched: no need to save interrupts in cfq_kick_queue() ... Browse Code »

It's called from the workqueue handlers from process context, so
we always have irqs enabled when entered.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:11:10 +0800
d6ceb25e8 cfq-iosched: don't delay queue kick for a merged request ... Browse Code »

"Zhang, Yanmin" reports that commit
b029195dda0129b427c6e579a3bb3ae752da3a93 introduced a regression
of about 50% with sequential threaded read workloads. The test
case is:

tiotest -k0 -k1 -k3 -f 80 -t 32

which starts 32 threads each reading a 80MB file. Twiddle the kick
queue logic so that we do start IO immediately, if it appears to be
a fully merged request. We can't really detect that, so just check
if the request is bigger than a page or not. The assumption is that
since single bio issues will first queue a single request with just
one page attached and then later do merges on that, if we already
have more than a page worth of data in the request, then the request
is most likely good to go.

Verified that this doesn't cause a regression with the test case that
commit b029195dda0129b427c6e579a3bb3ae752da3a93 was fixing. It does not,
we still see maximum sized requests for the queue-then-merge cases.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 14:28:12 +0800
ff6657c6c cfq-iosched: get rid of private SYNC/ASYNC defines ... Browse Code »

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 14:28:10 +0800
b0b78f81a cfq-iosched: use rw_is_sync() to see if rw flags are sync or not ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 14:28:10 +0800

07 Apr, 2009

3 commits

b029195dd cfq-iosched: don't let idling interfere with plugging ... Browse Code »

When CFQ is waiting for a new request from a process, currently it'll
immediately restart queuing when it sees such a request. This doesn't
work very well with streamed IO, since we then end up splitting IO
that would otherwise have been merged nicely. For a simple dd test,
this causes 10x as many requests to be issued as we should have.
Normally this goes unnoticed due to the low overhead of requests
at the device side, but some hardware is very sensitive to request
sizes and there it can cause big slow downs.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 17:38:31 +0800
75e50984f cfq-iosched: kill two unused cfqq flags ... Browse Code »

We only manipulate the must_dispatch and queue_new flags, they are not
tested anymore. So get rid of them.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 14:56:14 +0800
2f5cb7381 cfq-iosched: change dispatch logic to deal with single requests at the time ... Browse Code »

The IO scheduler core calls into the IO scheduler dispatch_request hook
to move requests from the IO scheduler and into the driver dispatch
list. It only does so when the dispatch list is empty. CFQ moves several
requests to the dispatch list, which can cause higher latencies if we
suddenly have to switch to some important sync IO. Change the logic to
move one request at the time instead.

This should almost be functionally equivalent to what we did before,
except that we now honor 'quantum' as the maximum queue depth at the
device side from any single cfqq. If there's just a single active
cfqq, we allow up to 4 times the normal quantum.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 14:51:19 +0800

06 Apr, 2009

1 commit

aeb6fafb8 block: Add flag for telling the IO schedulers NOT to anticipate more IO ... Browse Code »

By default, CFQ will anticipate more IO from a given io context if the
previously completed IO was sync. This used to be fine, since the only
sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
being used now, we don't want to anticipate for those.

Add a bio/request flag that informs the IO scheduler that this is a sync
request that we should not idle for. Introduce WRITE_ODIRECT specifically
for O_DIRECT writes, and make sure that the other sync writes set this
flag.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2009-04-06 23:04:54 +0800

30 Jan, 2009

1 commit

3a9a3f6cc cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice ... Browse Code »

This patch adds the ability to pre-empt an ongoing BE timeslice when a RT
request is waiting for the current timeslice to complete. This reduces the
wait time to disk for RT requests from an upper bound of 4 (current value
of cfq_quantum) to 1 disk request.

Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt()
and retested.

Latency(secs) for the RT task when doing sequential reads from 10G file.
| only RT | RT + BE | RT + BE + this patch
small (512 byte) reads | 143 | 163 | 145
large (1Mb) reads | 142 | 158 | 146

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2009-01-30 19:47:33 +0800

29 Dec, 2008

4 commits

62c1fe9d9 cfq-iosched: fix race between exiting queue and exiting task ... Browse Code »

Original patch from Nikanth Karthikesan

When a queue exits the queue lock is taken and cfq_exit_queue() would free all
the cic's associated with the queue.

But when a task exits, cfq_exit_io_context() gets cic one by one and then
locks the associated queue to call __cfq_exit_single_io_context. It looks like
between getting a cic from the ioc and locking the queue, the queue might have
exited on another cpu.

Fix this by rechecking the cfq_io_context queue key inside the queue lock
again, and not calling into __cfq_exit_single_io_context() if somebody
beat us to it.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:52 +0800
30e0dc28b cfq-iosched: remove limit of dispatch depth of max 4 times quantum ... Browse Code »

This basically limits the hardware queue depth to 4*quantum at any
point in time, which is 16 with the default settings. As CFQ uses
other means to shrink the hardware queue when necessary in the first
place, there's really no need for this extra heuristic. Additionally,
it ends up hurting performance in some cases.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:51 +0800
b374d18a4 block: get rid of elevator_t typedef ... Browse Code »

Just use struct elevator_queue everywhere instead.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:50 +0800
64d01dc9e block: use cancel_work_sync() instead of kblockd_flush_work() ... Browse Code »

After many improvements on kblockd_flush_work, it is now identical to
cancel_work_sync, so a direct call to cancel_work_sync is suggested.

The only difference is that cancel_work_sync is a GPL symbol,
so no non-GPL modules anymore.

Signed-off-by: Cheng Renquan
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Cheng Renquan
2008-12-29 15:28:44 +0800

09 Oct, 2008

4 commits

f7d7b7a7a block: as/cfq ssd idle check update ... Browse Code »

We really need to know about the hardware tagging support as well,
since if the SSD does not do tagging then we still want to idle.
Otherwise have the same dependent sync IO vs flooding async IO
problem as on rotational media.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:19 +0800
a68bbddba block: add queue flag for SSD/non-rotational devices ... Browse Code »

We don't want to idle in AS/CFQ if the device doesn't have a seek
penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational
device, low level drivers should set this flag upon discovery of
an SSD or similar device type.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:19 +0800
45333d5a3 cfq-iosched: fix queue depth detection ... Browse Code »

CFQ's detection of queueing devices assumes a non-queuing device and detects
if the queue depth reaches a certain threshold. Under some workloads (e.g.
synchronous reads), CFQ effectively forces a unit queue depth, thus defeating
the detection logic. This leads to poor performance on queuing hardware,
since the idle window remains enabled.

This patch inverts the sense of the logic: assume a queuing-capable device,
and detect if the depth does not exceed the threshold.

Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe

Aaron Carroll
2008-10-09 14:56:09 +0800
18887ad91 block: make kblockd_schedule_work() take the queue as parameter ... Browse Code »

Preparatory patch for checking queuing affinity.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:09 +0800

03 Jul, 2008

3 commits

c265a7f41 cfq-iosched: get rid of enable_idle being unused warning ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-07-03 19:21:14 +0800
7b679138b cfq-iosched: add message logging through blktrace ... Browse Code »

Now that blktrace has the ability to carry arbitrary messages in
its stream, use that for some CFQ logging.

Signed-off-by: Jens Axboe

Jens Axboe
2008-07-03 19:21:12 +0800
9a11b4ed0 cfq-iosched: properly protect ioc_gone and ioc count ... Browse Code »

If we have multiple tasks freeing cfq_io_contexts when cfq-iosched
is being unloaded, we could complete() ioc_gone twice. Fix that by
protecting ioc_gone complete() and clearing with a spinlock for
just that purpose. Doesn't matter from a performance perspective,
since it'll only enter that path when ioc_gone != NULL (when cfq-iosched
is being rmmod'ed).

Signed-off-by: Jens Axboe

Jens Axboe
2008-07-03 19:21:12 +0800

28 May, 2008

2 commits

d6de8be71 cfq-iosched: fix RCU problem in cfq_cic_lookup() ... Browse Code »

cfq_cic_lookup() needs to properly protect ioc->ioc_data before
dereferencing it and also exclude updaters of ioc->ioc_data as well.

Also add a number of comments documenting why the existing RCU usage
is OK.

Thanks a lot to "Paul E. McKenney" for
review and comments!

Signed-off-by: Jens Axboe

Jens Axboe
2008-05-28 20:49:28 +0800
be754d2c2 block: reorder cfq_queue to save space on 64bit builds ... Browse Code »

saves 8 bytes of padding & increases objects/slab from 30 to 32 on my
AMD64 config

Signed-off-by: Richard Kennedy
Signed-off-by: Jens Axboe

Richard Kennedy
2008-05-28 20:49:27 +0800

07 May, 2008

2 commits

6d63c2755 cfq-iosched: make io priorities inherit CPU scheduling class as well as nice ... Browse Code »

We currently set all processes to the best-effort scheduling class,
regardless of what CPU scheduling class they belong to. Improve that
so that we correctly track idle and rt scheduling classes as well.

Signed-off-by: Jens Axboe

Jens Axboe
2008-05-07 15:51:23 +0800
07416d29b cfq-iosched: fix RCU race in the cfq io_context destructor handling ... Browse Code »

put_io_context() drops the RCU read lock before calling into cfq_dtor(),
however we need to hold off freeing there before grabbing and
dereferencing the first object on the list.

So extend the rcu_read_lock() scope to cover the calling of cfq_dtor(),
and optimize cfq_free_io_context() to use a new variant for
call_for_each_cic() that assumes the RCU read lock is already held.

Hit in the wild by Alexey Dobriyan

Signed-off-by: Jens Axboe

Jens Axboe
2008-05-07 15:28:57 +0800

10 Apr, 2008

1 commit

4faa3c815 cfq-iosched: do not leak ioc_data across iosched switches ... Browse Code »

When switching scheduler from cfq, cfq_exit_queue() does not clear
ioc->ioc_data, leaving a dangling pointer that can deceive the following
lookups when the iosched is switched back to cfq. The pattern that can
trigger that is the following:

- elevator switch from cfq to something else;
- module unloading, with elv_unregister() that calls cfq_free_io_context()
on ioc freeing the cic (via the .trim op);
- module gets reloaded and the elevator switches back to cfq;
- reallocation of a cic at the same address as before (with a valid key).

To fix it just assign NULL to ioc_data in __cfq_exit_single_io_context(),
that is called from the regular exit path and from the elevator switching
code. The only path that frees a cic and is not covered is the error handling
one, but cic's freed in this way are never cached in ioc_data.

Signed-off-by: Fabio Checconi
Signed-off-by: Jens Axboe

Fabio Checconi
2008-04-10 14:28:01 +0800

02 Apr, 2008

1 commit

34e6bbf23 cfq-iosched: fix rcu freeing of cfq io contexts ... Browse Code »

SLAB_DESTROY_BY_RCU is not a direct substitute for normal call_rcu()
freeing, since it'll page freeing but NOT object freeing. So change
cfq to do the freeing on its own.

Signed-off-by: Fabio Checconi
Acked-by: Paul E. McKenney
Signed-off-by: Jens Axboe

Fabio Checconi
2008-04-02 21:42:20 +0800

19 Feb, 2008

1 commit

ffc4e7595 cfq-iosched: add hlist for browsing parallel to the radix tree ... Browse Code »

It's cumbersome to browse a radix tree from start to finish, especially
since we modify keys when a process exits. So add a hlist for the single
purpose of browsing over all known cfq_io_contexts, used for exit,
io prio change, etc.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=9948

Signed-off-by: Jens Axboe

Jens Axboe
2008-02-19 17:04:00 +0800

01 Feb, 2008

1 commit

fe094d98e cfq-iosched: make checkpatch compliant ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-02-01 16:26:33 +0800

28 Jan, 2008

4 commits

febffd618 cfq-iosched: kill some big inlines ... Browse Code »

Use of inlines were a bit over the top, trim them down a bit.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 20:19:43 +0800
0871714e0 cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions ... Browse Code »

Currently you must be root to set idle io prio class on a process. This
is due to the fact that the idle class is implemented as a true idle
class, meaning that it will not make progress if someone else is
requesting disk access. Unfortunately this means that it opens DOS
opportunities by locking down file system resources, hence it is root
only at the moment.

This patch relaxes the idle class a little, by removing the truly idle
part (which entals a grace period with associated timer). The
modifications make the idle class as close to zero impact as can be done
while still guarenteeing progress. This means we can relax the root only
criteria as well.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 18:38:15 +0800
4ac845a2e block: cfq: make the io contect sharing lockless ... Browse Code »

The io context sharing introduced a per-ioc spinlock, that would protect
the cfq io context lookup. That is a regression from the original, since
we never needed any locking there because the ioc/cic were process private.

The cic lookup is changed from an rbtree construct to a radix tree, which
we can then use RCU to make the reader side lockless. That is the performance
critical path, modifying the radix tree is only done on process creation
(when that process first does IO, actually) and on process exit (if that
process has done IO).

As it so happens, radix trees are also much faster for this type of
lookup where the key is a pointer. It's a very sparse tree.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 17:50:33 +0800
66dac98ed io_context sharing - cfq changes ... Browse Code »

changes in the cfq for io_context sharing

Signed-off-by: Jens Axboe

Nikanth Karthikesan
2008-01-28 17:50:32 +0800