Eric Lee / smarc-fsl-linux-kernel

10 Apr, 2008

1 commit

4faa3c815 cfq-iosched: do not leak ioc_data across iosched switches ... Browse Code »

When switching scheduler from cfq, cfq_exit_queue() does not clear
ioc->ioc_data, leaving a dangling pointer that can deceive the following
lookups when the iosched is switched back to cfq. The pattern that can
trigger that is the following:

- elevator switch from cfq to something else;
- module unloading, with elv_unregister() that calls cfq_free_io_context()
on ioc freeing the cic (via the .trim op);
- module gets reloaded and the elevator switches back to cfq;
- reallocation of a cic at the same address as before (with a valid key).

To fix it just assign NULL to ioc_data in __cfq_exit_single_io_context(),
that is called from the regular exit path and from the elevator switching
code. The only path that frees a cic and is not covered is the error handling
one, but cic's freed in this way are never cached in ioc_data.

Signed-off-by: Fabio Checconi
Signed-off-by: Jens Axboe

Fabio Checconi
2008-04-10 14:28:01 +0800

02 Apr, 2008

1 commit

34e6bbf23 cfq-iosched: fix rcu freeing of cfq io contexts ... Browse Code »

SLAB_DESTROY_BY_RCU is not a direct substitute for normal call_rcu()
freeing, since it'll page freeing but NOT object freeing. So change
cfq to do the freeing on its own.

Signed-off-by: Fabio Checconi
Acked-by: Paul E. McKenney
Signed-off-by: Jens Axboe

Fabio Checconi
2008-04-02 21:42:20 +0800

19 Feb, 2008

1 commit

ffc4e7595 cfq-iosched: add hlist for browsing parallel to the radix tree ... Browse Code »

It's cumbersome to browse a radix tree from start to finish, especially
since we modify keys when a process exits. So add a hlist for the single
purpose of browsing over all known cfq_io_contexts, used for exit,
io prio change, etc.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=9948

Signed-off-by: Jens Axboe

Jens Axboe
2008-02-19 17:04:00 +0800

01 Feb, 2008

1 commit

fe094d98e cfq-iosched: make checkpatch compliant ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-02-01 16:26:33 +0800

28 Jan, 2008

5 commits

febffd618 cfq-iosched: kill some big inlines ... Browse Code »

Use of inlines were a bit over the top, trim them down a bit.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 20:19:43 +0800
0871714e0 cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions ... Browse Code »

Currently you must be root to set idle io prio class on a process. This
is due to the fact that the idle class is implemented as a true idle
class, meaning that it will not make progress if someone else is
requesting disk access. Unfortunately this means that it opens DOS
opportunities by locking down file system resources, hence it is root
only at the moment.

This patch relaxes the idle class a little, by removing the truly idle
part (which entals a grace period with associated timer). The
modifications make the idle class as close to zero impact as can be done
while still guarenteeing progress. This means we can relax the root only
criteria as well.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 18:38:15 +0800
4ac845a2e block: cfq: make the io contect sharing lockless ... Browse Code »

The io context sharing introduced a per-ioc spinlock, that would protect
the cfq io context lookup. That is a regression from the original, since
we never needed any locking there because the ioc/cic were process private.

The cic lookup is changed from an rbtree construct to a radix tree, which
we can then use RCU to make the reader side lockless. That is the performance
critical path, modifying the radix tree is only done on process creation
(when that process first does IO, actually) and on process exit (if that
process has done IO).

As it so happens, radix trees are also much faster for this type of
lookup where the key is a pointer. It's a very sparse tree.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 17:50:33 +0800
66dac98ed io_context sharing - cfq changes ... Browse Code »

changes in the cfq for io_context sharing

Signed-off-by: Jens Axboe

Nikanth Karthikesan
2008-01-28 17:50:32 +0800
fd0928df9 ioprio: move io priority from task_struct to io_context ... Browse Code »

This is where it belongs and then it doesn't take up space for a
process that doesn't do IO.

Signed-off-by: Jens Axboe

Jens Axboe
2008-01-28 17:50:29 +0800

18 Dec, 2007

1 commit

2fdd82bd8 block: let elv_register() return void ... Browse Code »

elv_register() always returns 0, and there isn't anything it does where
it should return an error (the only error condition is so grave that
it's handled with a BUG_ON).

Signed-off-by: Adrian Bunk
Signed-off-by: Jens Axboe

Adrian Bunk
2007-12-18 15:29:28 +0800

07 Nov, 2007

3 commits

0e7be9edb cfq_idle_class_timer: add paranoid checks for jiffies overflow ... Browse Code »

In theory, if the queue was idle long enough, cfq_idle_class_timer may have
a false (and very long) timeout because jiffies can wrap into the past wrt
->last_end_request.

Signed-off-by: Oleg Nesterov
Signed-off-by: Jens Axboe

Oleg Nesterov
2007-11-07 20:51:35 +0800
b70c864d3 cfq: fix IOPRIO_CLASS_IDLE delays ... Browse Code »

After the fresh boot:

ionice -c3 -p $$
echo cfq >> /sys/block/XXX/queue/scheduler
dd if=/dev/XXX of=/dev/null bs=512 count=1

Now dd hangs in D state and the queue is completely stalled for approximately
INITIAL_JIFFIES + CFQ_IDLE_GRACE jiffies. This is because cfq_init_queue()
forgets to initialize cfq_data->last_end_request.

(I guess this patch is not complete, overflow is still possible)

Signed-off-by: Oleg Nesterov
Signed-off-by: Jens Axboe

Oleg Nesterov
2007-11-07 16:46:13 +0800
2389d1ef1 cfq: fix IOPRIO_CLASS_IDLE accounting ... Browse Code »

Spotted by Nick , hopefully can explain the second trace in
http://bugzilla.kernel.org/show_bug.cgi?id=9180.

If ->async_idle_cfqq != NULL cfq_put_async_queues() puts it IOPRIO_BE_NR times
in a loop. Fix this.

Signed-off-by: Oleg Nesterov
Signed-off-by: Jens Axboe

Oleg Nesterov
2007-11-07 16:45:00 +0800

29 Oct, 2007

2 commits

0a0836a09 cfq_get_queue: fix possible NULL pointer access ... Browse Code »

cfq_get_queue()->cfq_find_alloc_queue() can fail, check the returned value.

Signed-off-by: Oleg Nesterov

Note that this isn't a bug at the moment, since the regular IO path
does not call this path without __GFP_WAIT set. However, it could be a
future bug, so I've applied it.

Signed-off-by: Jens Axboe

Oleg Nesterov
2007-10-29 18:33:05 +0800
4310864b9 cfq_exit_queue() should cancel cfq_data->unplug_work ... Browse Code »

Spotted by Nick , perhaps explains the first trace in
http://bugzilla.kernel.org/show_bug.cgi?id=9180.

cfq_exit_queue() should cancel cfqd->unplug_work before freeing cfqd.
blk_sync_queue() seems unneeded, removed.

Q: why cfq_exit_queue() calls cfq_shutdown_timer_wq() twice?

Signed-off-by: Oleg Nesterov
Signed-off-by: Jens Axboe

Oleg Nesterov
2007-10-29 18:33:05 +0800

24 Jul, 2007

1 commit

165125e1e [BLOCK] Get rid of request_queue_t typedef ... Browse Code »

Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-24 15:28:11 +0800

20 Jul, 2007

2 commits

8350163a9 cfq: Write-only stuff in CFQ data structures ... Browse Code »

There are some leftover bits from the task cooperator patch, that was
yanked out again. While it will get reintroduced, no point in having
this write-only stuff in the tree. So yank it.

Signed-off-by: Jens Axboe

Alexey Dobriyan
2007-07-20 16:07:50 +0800
c2dea2d1f cfq: async queue allocation per priority ... Browse Code »

If we have two processes with different ioprio_class, but the same
ioprio_data, their async requests will fall into the same queue. I guess
such behavior is not expected, because it's not right to put real-time
requests and best-effort requests in the same queue.

The attached patch fixes the problem by introducing additional *cfqq
fields on cfqd, pointing to per-(class,priority) async queues.

Signed-off-by: Jens Axboe

Vasily Tarasov
2007-07-20 16:06:38 +0800

18 Jul, 2007

1 commit

94f6030ca Slab allocators: Replace explicit zeroing with __GFP_ZERO ... Browse Code »

kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past. But with __GFP_ZERO it is possible now to do zeroing
while allocating.

Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-07-18 01:23:02 +0800

10 Jul, 2007

1 commit

15c31be4d cfq-iosched: fix async queue behaviour ... Browse Code »

With the cfq_queue hash removal, we inadvertently got rid of the
async queue sharing. This was not intentional, in fact CFQ purposely
shares the async queue per priority level to get good merging for
async writes.

So put some logic in cfq_get_queue() to track the shared queues.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 19:43:25 +0800

08 May, 2007

1 commit

0a31bd5f2 KMEM_CACHE(): simplify slab cache creation ... Browse Code »

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:55 +0800

30 Apr, 2007

17 commits

597bc485d cfq-iosched: speedup cic rb lookup ... Browse Code »

We often lookup the same queue many times in succession, so cache
the last looked up queue to avoid browsing the rbtree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
91fac317a cfq-iosched: get rid of cfqq hash ... Browse Code »

cfq hash is no more necessary. We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue. In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.

Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree

Signed-off-by: Vasily Tarasov

Changes by me:

- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too

Signed-off-by: Jens Axboe

Vasily Tarasov
2007-04-30 15:01:23 +0800
cc1974797 cfq-iosched: tighten queue request overlap condition ... Browse Code »

For tagged devices, allow overlap of requests if the idle window
isn't enabled on the current active queue.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
3ed9a2965 cfq-iosched: improve sync vs async workloads ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
1be92f2fc cfq-iosched: never allow an async queue idling ... Browse Code »

We don't enable it by default, don't let it get enabled during
runtime.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
20e493a8d cfq-iosched: get rid of ->dispatch_slice ... Browse Code »

We can track it fairly accurately locally, let the slice handling
take care of the rest.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
6084cdda0 cfq-iosched: don't pass unused preemption variable around ... Browse Code »

We don't use it anymore in the slice expiry handling.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
edd75ffd9 cfq-iosched: get rid of ->cur_rr and ->cfq_list ... Browse Code »

It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
67e6b49e3 cfq-iosched: slice offset should take ioprio into account ... Browse Code »

Use the max_slice-cur_slice as the multipler for the insertion offset.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
498d3aa2b [PATCH] cfq-iosched: style cleanups and comments ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
67060e379 cfq-iosched: sort IDLE queues into the rbtree ... Browse Code »

Same treatment as the RT conversion, just put the sorted idle
branch at the end of the tree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
0c534e0a4 cfq-iosched: sort RT queues into the rbtree ... Browse Code »

Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
cc09e2990 [PATCH] cfq-iosched: speed up rbtree handling ... Browse Code »

For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.

Also spotted in the CFS CPU scheduler code.

Improved by Alan D. Brunelle by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
d9e7620e6 cfq-iosched: rework the whole round-robin list concept ... Browse Code »

Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.

This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.

The diffstat speaks for itself.

cfq-iosched.c | 363 +++++++++++++++++---------------------------------
1 file changed, 125 insertions(+), 238 deletions(-)

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
1afba0451 cfq-iosched: minor updates ... Browse Code »

- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
a while to "think".

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
6d048f531 cfq-iosched: development update ... Browse Code »

- Implement logic for detecting cooperating processes, so we
choose the best available queue whenever possible.

- Improve residual slice time accounting.

- Remove dead code: we no longer see async requests coming in on
sync queues. That part was removed a long time ago. That means
that we can also remove the difference between cfq_cfqq_sync()
and cfq_cfqq_class_sync(), they are now indentical. And we can
kill the on_dispatch array, just make it a counter.

- Allow a process to go into the current list, if it hasn't been
serviced in this scheduler tick yet.

Possible future improvements including caching the cfqq lookup
in cfq_close_cooperator(), so we don't have to look it up twice.
cfq_get_best_queue() should just use that last decision instead
of doing it again.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
1e3335de0 cfq-iosched: improve preemption for cooperating tasks ... Browse Code »

When testing the syslet async io approach, I discovered that CFQ
sometimes didn't perform as well as expected. cfq_should_preempt()
needs to better check for cooperating tasks, so fix that by allowing
preemption of an equal priority queue if the recently queued request
is as good a candidate for IO as the one we are currently waiting for.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800

25 Apr, 2007

1 commit

5044eed48 cfq-iosched: fix alias + front merge bug ... Browse Code »

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of the resulting
oops is seen here:

http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.

Read the more complete analysis at:

http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-25 23:41:48 +0800

21 Apr, 2007

1 commit

a99380065 cfq-iosched: fix sequential write regression ... Browse Code »

We have a 10-15% performance regression for sequential writes on TCQ/NCQ
enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
reported by Valerie Clement and the Intel
testing folks. The regression is because of CFQ's now more aggressive
queue control, limiting the depth available to the device.

This patches fixes that regression by allowing a greater depth when only
one queue is busy. It has been tested to not impact sync-vs-async
workloads too much - we still do a lot better than 2.6.20.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-21 13:56:29 +0800