Eric Lee / smarc-fsl-linux-kernel

30 Apr, 2007

20 commits

07e447080 Merge branch 'cfq' into for-linus Browse Code »

Jens Axboe
2007-04-30 15:09:27 +0800
2a12dcd71 [PATCH] elevator: elv_list_lock does not need irq disabling ... Browse Code »

It's never grabbed from irq context, so just make it plain spin_lock().

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:08:17 +0800
597bc485d cfq-iosched: speedup cic rb lookup ... Browse Code »

We often lookup the same queue many times in succession, so cache
the last looked up queue to avoid browsing the rbtree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
4e521c27e ll_rw_blk: add io_context private pointer ... Browse Code »

To be used by as/cfq as they see fit.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
91fac317a cfq-iosched: get rid of cfqq hash ... Browse Code »

cfq hash is no more necessary. We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue. In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.

Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree

Signed-off-by: Vasily Tarasov

Changes by me:

- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too

Signed-off-by: Jens Axboe

Vasily Tarasov
2007-04-30 15:01:23 +0800
cc1974797 cfq-iosched: tighten queue request overlap condition ... Browse Code »

For tagged devices, allow overlap of requests if the idle window
isn't enabled on the current active queue.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
3ed9a2965 cfq-iosched: improve sync vs async workloads ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
1be92f2fc cfq-iosched: never allow an async queue idling ... Browse Code »

We don't enable it by default, don't let it get enabled during
runtime.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
20e493a8d cfq-iosched: get rid of ->dispatch_slice ... Browse Code »

We can track it fairly accurately locally, let the slice handling
take care of the rest.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
6084cdda0 cfq-iosched: don't pass unused preemption variable around ... Browse Code »

We don't use it anymore in the slice expiry handling.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
edd75ffd9 cfq-iosched: get rid of ->cur_rr and ->cfq_list ... Browse Code »

It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
67e6b49e3 cfq-iosched: slice offset should take ioprio into account ... Browse Code »

Use the max_slice-cur_slice as the multipler for the insertion offset.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
498d3aa2b [PATCH] cfq-iosched: style cleanups and comments ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
67060e379 cfq-iosched: sort IDLE queues into the rbtree ... Browse Code »

Same treatment as the RT conversion, just put the sorted idle
branch at the end of the tree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
0c534e0a4 cfq-iosched: sort RT queues into the rbtree ... Browse Code »

Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
cc09e2990 [PATCH] cfq-iosched: speed up rbtree handling ... Browse Code »

For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.

Also spotted in the CFS CPU scheduler code.

Improved by Alan D. Brunelle by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
d9e7620e6 cfq-iosched: rework the whole round-robin list concept ... Browse Code »

Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.

This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.

The diffstat speaks for itself.

cfq-iosched.c | 363 +++++++++++++++++---------------------------------
1 file changed, 125 insertions(+), 238 deletions(-)

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
1afba0451 cfq-iosched: minor updates ... Browse Code »

- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
a while to "think".

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
6d048f531 cfq-iosched: development update ... Browse Code »

- Implement logic for detecting cooperating processes, so we
choose the best available queue whenever possible.

- Improve residual slice time accounting.

- Remove dead code: we no longer see async requests coming in on
sync queues. That part was removed a long time ago. That means
that we can also remove the difference between cfq_cfqq_sync()
and cfq_cfqq_class_sync(), they are now indentical. And we can
kill the on_dispatch array, just make it a counter.

- Allow a process to go into the current list, if it hasn't been
serviced in this scheduler tick yet.

Possible future improvements including caching the cfqq lookup
in cfq_close_cooperator(), so we don't have to look it up twice.
cfq_get_best_queue() should just use that last decision instead
of doing it again.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
1e3335de0 cfq-iosched: improve preemption for cooperating tasks ... Browse Code »

When testing the syslet async io approach, I discovered that CFQ
sometimes didn't perform as well as expected. cfq_should_preempt()
needs to better check for cooperating tasks, so fix that by allowing
preemption of an equal priority queue if the recently queued request
is as good a candidate for IO as the one we are currently waiting for.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800

25 Apr, 2007

1 commit

5044eed48 cfq-iosched: fix alias + front merge bug ... Browse Code »

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of the resulting
oops is seen here:

http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.

Read the more complete analysis at:

http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-25 23:41:48 +0800

21 Apr, 2007

1 commit

a99380065 cfq-iosched: fix sequential write regression ... Browse Code »

We have a 10-15% performance regression for sequential writes on TCQ/NCQ
enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
reported by Valerie Clement and the Intel
testing folks. The regression is because of CFQ's now more aggressive
queue control, limiting the depth available to the device.

This patches fixes that regression by allowing a greater depth when only
one queue is busy. It has been tested to not impact sync-vs-async
workloads too much - we still do a lot better than 2.6.20.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-21 13:56:29 +0800

05 Apr, 2007

1 commit

2363cc026 [PATCH] remove protection of LANANA-reserved majors ... Browse Code »

Revert all this. It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.

Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.

Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-04-05 12:12:47 +0800

27 Mar, 2007

2 commits

1ffb96c58 make elv_register() output atomic ... Browse Code »

Booting 2.6.21-rc3-g45592145 I noticed the following on one of my
machines in the bootlog:

io scheduler noop registeredTime: jiffies clocksource has been installed.

io scheduler deadline registered (default)

Looking at block/elevator.c, it appears that elv_register() uses two
consecutive printks in a non-atomic way, leading to the above glitch. The
attached trivial patch fixes this issue, by using a single printk.

Signed-off-by: Thibaut VARENE
Signed-off-by: Jens Axboe

Thibaut VARENE
2007-03-27 14:53:04 +0800
f772b3d9c block: blk_max_pfn is somtimes wrong ... Browse Code »

There is a small problem in handling page bounce.

At the moment blk_max_pfn equals max_pfn, which is in fact not maximum
possible _number_ of a page frame, but the _amount_ of page frames. For
example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not
0xFFFF.

request_queue structure has a member q->bounce_pfn and queue needs bounce
pages for the pages _above_ this limit. This routine is handled by
blk_queue_bounce(), where the following check is produced:

if (q->bounce_pfn >= blk_max_pfn)
return;

Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn
equals 0x10000. In such situation the check above fails and for each bio
we always fall down for iterating over pages tied to the bio.

I want to notice, that for quite a big range of device drivers (ide, md,
...) such problem doesn't happen because they use BLK_BOUNCE_ANY for
bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and
then the check above doesn't fail. But for other drivers, which obtain
reuired value from drivers, it fails. For example sata_nv uses
ATA_DMA_MASK or dev->dma_mask.

I propose to use (max_pfn - 1) for blk_max_pfn. And the same for
blk_max_low_pfn. The patch also cleanses some checks related with
bounce_pfn.

Signed-off-by: Vasily Tarasov
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Vasily Tarasov
2007-03-27 14:52:47 +0800

21 Feb, 2007

2 commits

6d740cd5b [PATCH] lockdep: annotate BLKPG_DEL_PARTITION ... Browse Code »

>=============================================
>[ INFO: possible recursive locking detected ]
>2.6.19-1.2909.fc7 #1
>---------------------------------------------
>anaconda/587 is trying to acquire lock:
> (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
>
>but task is already holding lock:
> (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
>
>other info that might help us debug this:
>1 lock held by anaconda/587:
> #0: (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
>
>stack backtrace:
> [] show_trace_log_lvl+0x1a/0x2f
> [] show_trace+0x12/0x14
> [] dump_stack+0x16/0x18
> [] __lock_acquire+0x116/0xa09
> [] lock_acquire+0x56/0x6f
> [] __mutex_lock_slowpath+0xe5/0x24a
> [] mutex_lock+0x21/0x24
> [] blkdev_ioctl+0x600/0x76d
> [] block_ioctl+0x1b/0x1f
> [] do_ioctl+0x22/0x68
> [] vfs_ioctl+0x252/0x265
> [] sys_ioctl+0x49/0x63
> [] syscall_call+0x7/0xb

Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment
clarifying the bd_mutex locking, because I confused myself and initially
thought the lock order was wrong too.

Signed-off-by: Peter Zijlstra
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-02-21 09:10:16 +0800
b446b60e4 [PATCH] rework reserved major handling ... Browse Code »

Several people have reported failures in dynamic major device number handling
due to the recent changes in there to avoid handing out the local/experimental
majors.

Rolf reports that this is due to a gcc-4.1.0 bug.

The patch refactors that code a lot in an attempt to provoke the compiler into
behaving.

Cc: Rolf Eike Beer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-21 09:10:13 +0800

18 Feb, 2007

1 commit

a8e14b950 update I/O sched Kconfig help texts - CFQ is now default, not AS. ... Browse Code »

Change I/O scheduler description to correctly show CFQ as being the default
scheduler and not the anticipatory scheduler that previously was default.

Signed-off-by: Jesper Juhl
Signed-off-by: Adrian Bunk

Jesper Juhl
2007-02-18 03:08:22 +0800

13 Feb, 2007

2 commits

2b8693c06 [PATCH] mark struct file_operations const 3 ... Browse Code »

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-02-13 01:48:45 +0800
fdf892be3 [PATCH] register_blkdev(): don't hand out the LOCAL/EXPERIMENTAL majors ... Browse Code »

As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic
blockdev major allocation can hand out majors which LANANA has defined as
being for local/experimental use.

Cc: Torben Mathiasen
Cc: Greg KH
Cc: Al Viro
Cc: Tomas Klas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-13 01:48:27 +0800

12 Feb, 2007

10 commits

9ede209e8 cfq-iosched: improve continue or break logic in cfq_dispatch ... Browse Code »

This improves performance considerably for sync requests when you
have command queuing enabled.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
28f95cbc3 cfq-iosched: remove the implicit queue kicking in slice expire ... Browse Code »

We only really need it for a process going away, so move it to
those locations.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
3c6bd2f87 cfq-iosched: check whether a queue timed out in accounting ... Browse Code »

Makes it more fair for the residual slice count.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
cb8874119 cfq-iosched: tweak the FIFO checking ... Browse Code »

We currently check the FIFO once per slice. Optimize that a bit and
only do it as the first thing for a new slice, so we don't end up
doing a single request and then seek to the FIFO requests.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
1792669cc cfq-iosched: don't pass in queue for cfq_arm_slice_timer() ... Browse Code »

It must always be the active queue, otherwise it's a bug. So just
use the active_queue, don't pass it in explicitly.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
c5b680f3b cfq-iosched: account for slice over/under time ... Browse Code »

If a slice uses less than it is entitled to (or perhaps more), include
that in the decision on how much time to give it the next time it
gets serviced.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
44f7c1606 cfq-iosched: defer slice activation to first request being active ... Browse Code »

This better matches what time the queue is actually spending doing
IO.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
99f9628ab [PATCH] cfq-iosched: use last service point as the fairness criteria ... Browse Code »

Right now we use slice_start, which gives async queues an unfair
advantage. Chance that to service_last, and base the resorter
on that.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
b0b8d7494 cfq-iosched: document the cfqq flags ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800
98e41c7df [PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list() ... Browse Code »

Move the on_rr check into cfq_resort_rr_list(), every call site
needs to check it anyway.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800