Eric Lee / smarc-fsl-linux-kernel

25 Apr, 2007

1 commit

5044eed48 cfq-iosched: fix alias + front merge bug ... Browse Code »

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of the resulting
oops is seen here:

http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.

Read the more complete analysis at:

http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-25 23:41:48 +0800

21 Apr, 2007

1 commit

a99380065 cfq-iosched: fix sequential write regression ... Browse Code »

We have a 10-15% performance regression for sequential writes on TCQ/NCQ
enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
reported by Valerie Clement and the Intel
testing folks. The regression is because of CFQ's now more aggressive
queue control, limiting the depth available to the device.

This patches fixes that regression by allowing a greater depth when only
one queue is busy. It has been tested to not impact sync-vs-async
workloads too much - we still do a lot better than 2.6.20.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-21 13:56:29 +0800

05 Apr, 2007

1 commit

2363cc026 [PATCH] remove protection of LANANA-reserved majors ... Browse Code »

Revert all this. It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.

Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.

Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-04-05 12:12:47 +0800

27 Mar, 2007

2 commits

1ffb96c58 make elv_register() output atomic ... Browse Code »

Booting 2.6.21-rc3-g45592145 I noticed the following on one of my
machines in the bootlog:

io scheduler noop registeredTime: jiffies clocksource has been installed.

io scheduler deadline registered (default)

Looking at block/elevator.c, it appears that elv_register() uses two
consecutive printks in a non-atomic way, leading to the above glitch. The
attached trivial patch fixes this issue, by using a single printk.

Signed-off-by: Thibaut VARENE
Signed-off-by: Jens Axboe

Thibaut VARENE
2007-03-27 14:53:04 +0800
f772b3d9c block: blk_max_pfn is somtimes wrong ... Browse Code »

There is a small problem in handling page bounce.

At the moment blk_max_pfn equals max_pfn, which is in fact not maximum
possible _number_ of a page frame, but the _amount_ of page frames. For
example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not
0xFFFF.

request_queue structure has a member q->bounce_pfn and queue needs bounce
pages for the pages _above_ this limit. This routine is handled by
blk_queue_bounce(), where the following check is produced:

if (q->bounce_pfn >= blk_max_pfn)
return;

Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn
equals 0x10000. In such situation the check above fails and for each bio
we always fall down for iterating over pages tied to the bio.

I want to notice, that for quite a big range of device drivers (ide, md,
...) such problem doesn't happen because they use BLK_BOUNCE_ANY for
bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and
then the check above doesn't fail. But for other drivers, which obtain
reuired value from drivers, it fails. For example sata_nv uses
ATA_DMA_MASK or dev->dma_mask.

I propose to use (max_pfn - 1) for blk_max_pfn. And the same for
blk_max_low_pfn. The patch also cleanses some checks related with
bounce_pfn.

Signed-off-by: Vasily Tarasov
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Vasily Tarasov
2007-03-27 14:52:47 +0800

21 Feb, 2007

2 commits

6d740cd5b [PATCH] lockdep: annotate BLKPG_DEL_PARTITION ... Browse Code »

>=============================================
>[ INFO: possible recursive locking detected ]
>2.6.19-1.2909.fc7 #1
>---------------------------------------------
>anaconda/587 is trying to acquire lock:
> (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
>
>but task is already holding lock:
> (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
>
>other info that might help us debug this:
>1 lock held by anaconda/587:
> #0: (&bdev->bd_mutex){--..}, at: [] mutex_lock+0x21/0x24
>
>stack backtrace:
> [] show_trace_log_lvl+0x1a/0x2f
> [] show_trace+0x12/0x14
> [] dump_stack+0x16/0x18
> [] __lock_acquire+0x116/0xa09
> [] lock_acquire+0x56/0x6f
> [] __mutex_lock_slowpath+0xe5/0x24a
> [] mutex_lock+0x21/0x24
> [] blkdev_ioctl+0x600/0x76d
> [] block_ioctl+0x1b/0x1f
> [] do_ioctl+0x22/0x68
> [] vfs_ioctl+0x252/0x265
> [] sys_ioctl+0x49/0x63
> [] syscall_call+0x7/0xb

Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment
clarifying the bd_mutex locking, because I confused myself and initially
thought the lock order was wrong too.

Signed-off-by: Peter Zijlstra
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-02-21 09:10:16 +0800
b446b60e4 [PATCH] rework reserved major handling ... Browse Code »

Several people have reported failures in dynamic major device number handling
due to the recent changes in there to avoid handing out the local/experimental
majors.

Rolf reports that this is due to a gcc-4.1.0 bug.

The patch refactors that code a lot in an attempt to provoke the compiler into
behaving.

Cc: Rolf Eike Beer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-21 09:10:13 +0800

18 Feb, 2007

1 commit

a8e14b950 update I/O sched Kconfig help texts - CFQ is now default, not AS. ... Browse Code »

Change I/O scheduler description to correctly show CFQ as being the default
scheduler and not the anticipatory scheduler that previously was default.

Signed-off-by: Jesper Juhl
Signed-off-by: Adrian Bunk

Jesper Juhl
2007-02-18 03:08:22 +0800

13 Feb, 2007

2 commits

2b8693c06 [PATCH] mark struct file_operations const 3 ... Browse Code »

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-02-13 01:48:45 +0800
fdf892be3 [PATCH] register_blkdev(): don't hand out the LOCAL/EXPERIMENTAL majors ... Browse Code »

As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic
blockdev major allocation can hand out majors which LANANA has defined as
being for local/experimental use.

Cc: Torben Mathiasen
Cc: Greg KH
Cc: Al Viro
Cc: Tomas Klas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-13 01:48:27 +0800

12 Feb, 2007

15 commits

9ede209e8 cfq-iosched: improve continue or break logic in cfq_dispatch ... Browse Code »

This improves performance considerably for sync requests when you
have command queuing enabled.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
28f95cbc3 cfq-iosched: remove the implicit queue kicking in slice expire ... Browse Code »

We only really need it for a process going away, so move it to
those locations.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
3c6bd2f87 cfq-iosched: check whether a queue timed out in accounting ... Browse Code »

Makes it more fair for the residual slice count.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
cb8874119 cfq-iosched: tweak the FIFO checking ... Browse Code »

We currently check the FIFO once per slice. Optimize that a bit and
only do it as the first thing for a new slice, so we don't end up
doing a single request and then seek to the FIFO requests.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
1792669cc cfq-iosched: don't pass in queue for cfq_arm_slice_timer() ... Browse Code »

It must always be the active queue, otherwise it's a bug. So just
use the active_queue, don't pass it in explicitly.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
c5b680f3b cfq-iosched: account for slice over/under time ... Browse Code »

If a slice uses less than it is entitled to (or perhaps more), include
that in the decision on how much time to give it the next time it
gets serviced.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
44f7c1606 cfq-iosched: defer slice activation to first request being active ... Browse Code »

This better matches what time the queue is actually spending doing
IO.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
99f9628ab [PATCH] cfq-iosched: use last service point as the fairness criteria ... Browse Code »

Right now we use slice_start, which gives async queues an unfair
advantage. Chance that to service_last, and base the resorter
on that.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:45 +0800
b0b8d7494 cfq-iosched: document the cfqq flags ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800
98e41c7df [PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list() ... Browse Code »

Move the on_rr check into cfq_resort_rr_list(), every call site
needs to check it anyway.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800
aaf1228dd cfq-iosched: remove cfq_io_context last_queue ... Browse Code »

It hasn't been used for a while, kill it off and remove the old
if 0 code chunk.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800
783660b2f elevator: don't sort reads between writes ... Browse Code »

Don't allow elv_dispatch_sort() to mix reads and writes together,
it's rarely a good idea.

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800
cad975164 elevator: abstract out the activate and deactivate functions ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-02-12 06:14:44 +0800
c827ba4cb Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Update defconfig.
[SPARC64]: Add PCI MSI support on Niagara.
[SPARC64] IRQ: Use irq_desc->chip_data instead of irq_desc->handler_data
[SPARC64]: Add obppath sysfs attribute for SBUS and PCI devices.
[PARTITION]: Add whole_disk attribute.

Linus Torvalds
2007-02-12 03:37:45 +0800
23c887522 [PATCH] Relay: add CPU hotplug support ... Browse Code »

Mathieu originally needed to add this for tracing Xen, but it's something
that's needed for any application that can be tracing while cpus are added.

unplug isn't supported by this patch. The thought was that at minumum a new
buffer needs to be added when a cpu comes up, but it wasn't worth the effort
to remove buffers on cpu down since they'd be freed soon anyway when the
channel was closed.

[zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock]
Signed-off-by: Mathieu Desnoyers
Cc: Tom Zanussi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2007-02-12 02:51:28 +0800

11 Feb, 2007

1 commit

d18d7682c [PARTITION]: Add whole_disk attribute. ... Browse Code »

Some partitioning systems create special partitions that
span the entire disk. One example are Sun partitions, and
this whole-disk partition exists to tell the firmware the
extent of the entire device so it can load the boot block
and do other things.

Such partitions should not be treated as normal partitions,
because all the other partitions overlap this whole-disk one.
So we'd see multiple instances of the same UUID etc. which
we do not want. udev and friends can thus search for this
'whole_disk' attribute and use it to decide to ignore the
partition.

Signed-off-by: Fabio Massimo Di Nitto
Signed-off-by: David S. Miller

Fabio Massimo Di Nitto
2007-02-11 15:50:00 +0800

10 Feb, 2007

1 commit

387bb1737 [PATCH] md: fix various bugs with aligned reads in RAID5 ... Browse Code »

It is possible for raid5 to be sent a bio that is too big for an underlying
device. So if it is a READ that we pass stright down to a device, it will
fail and confuse RAID5.

So in 'chunk_aligned_read' we check that the bio fits within the parameters
for the target device and if it doesn't fit, fall back on reading through
the stripe cache and making lots of one-page requests.

Note that this is the earliest time we can check against the device because
earlier we don't have a lock on the device, so it could change underneath
us.

Also, the code for handling a retry through the cache when a read fails has
not been tested and was badly broken. This patch fixes that code.

Signed-off-by: Neil Brown
Cc: "Kai"
Cc:
Cc:
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2007-02-10 01:25:46 +0800

30 Jan, 2007

1 commit

c0d4d573f [PATCH] Fix SG_IO timeout jiffy conversion ... Browse Code »

Commit 85e04e371b5a321b5df2bc3f8e0099a64fb087d7 cleaned up the timeout
conversion, but did it exactly the wrong way. We get msecs from user
space, and should convert them into jiffies. Not the other way around.

Here is a fix with the overflow check sg.c has added in. This fixes DVD
burnign with Nero.

Signed-off-by: Mike Christie
[ "you'll be wanting a comma there" - Andrew ]
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Christie
2007-01-30 12:32:03 +0800

24 Jan, 2007

1 commit

95543179f [PATCH] elevator: move clearing of unplug flag earlier ... Browse Code »

A flag was recently added to the elevator code to avoid
performing an unplug when reuests are being re-queued.
The goal of this flag was to avoid a deep recursion that
can occur when re-queueing requests after a SCSI device/host
reset. See http://lkml.org/lkml/2006/5/17/254

However, that fix added the flag near the bottom of a case
statement, where an earlier break (in an if statement) could
transport one out of the case, without setting the flag.
This patch sets the flag earlier in the case statement.

I re-discovered the deep recursion recently during testing;
I was told that it was a known problem, and the fix to it was
in the kernel I was testing. Indeed it was ... but it didn't
fix the bug. With the patch below, I no longer see the bug.

Signed-off by: Linas Vepstas
Signed-off-by: Jens Axboe
Cc: Chris Wright
Signed-off-by: Linus Torvalds

Linas Vepstas
2007-01-24 03:01:17 +0800

03 Jan, 2007

1 commit

ec8acb690 [PATCH] cfq-iosched: merging problem ... Browse Code »

Two issues:

- The final return 1 should be a return 0, otherwise comparing cfqq is
a noop.

- bio_sync() only checks the sync flag, while rq_is_sync() checks both
for READ and sync. The latter is what we want. Expand the bio check
to include reads, and relax the restriction to allow merging of async
io into sync requests.

In the future we want to clean up the SYNC logic, right now it means
both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue.
Leave that for later.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-01-03 01:46:16 +0800

23 Dec, 2006

2 commits

719d34027 [PATCH] cfq-iosched: tighten allow merge criteria ... Browse Code »

The logic in cfq_allow_merge() wasn't clear enough - basically allow
merging for the same queues only. Do a fast check for 'rq and bio both
sync/async' before doing the cfqq hash lookup.

This is verified to work with the fixed elv_try_merge() from commit
bb4067e34159648d394943d5e2a011f838bff22f.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2006-12-23 06:13:08 +0800
af9997e42 [PATCH] fix kernel-doc warnings in 2.6.20-rc1 ... Browse Code »

Fix kernel-doc warnings in 2.6.20-rc1.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-12-23 00:55:47 +0800

22 Dec, 2006

1 commit

bb4067e34 [PATCH] elevator: fixup typo in merge logic ... Browse Code »

The recent io scheduler allow_merge commit left the block layer with
no merging, oops. This patch fixes that up.

That means the CFQ change needs to be verified again, it might not fix
the original bug now. But that's a seperate thing, I'll double check
that tomorrow.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2006-12-22 14:01:04 +0800

20 Dec, 2006

1 commit

da7752650 [PATCH] cfq-iosched: don't allow sync merges across queues ... Browse Code »

Currently we allow any merge, even if the io originates from different
processes. This can cause really bad starvation and unfairness, if those
ios happen to be synchronous (reads or direct writes).

So add a allow_merge hook to the io scheduler ops, so an io scheduler can
help decide whether a bio/process combination may be merged with an
existing request.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-20 18:04:12 +0800

19 Dec, 2006

5 commits

8e5cfc45e [PATCH] Fixup blk_rq_unmap_user() API ... Browse Code »

The blk_rq_unmap_user() API is not very nice. It expects the caller to
know that rq->bio has to be reset to the original bio, and it will
silently do nothing if that is not done. Instead make it explicit that
we need to pass in the first bio, by expecting a bio argument.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-19 18:12:46 +0800
48785bb9f [PATCH] __blk_rq_unmap_user() fails to return error ... Browse Code »

If the bio is user copied, the copy back could return -EFAULT. Make
sure we return any error seen during unmapping.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-19 18:07:59 +0800
9c9381f94 [PATCH] __blk_rq_map_user() doesn't need to grab the queue_lock ... Browse Code »

It was for driver private back_merge_fn hooks, but they don't exist
anymore.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-19 15:34:17 +0800
1aa4f24fe [PATCH] Remove queue merging hooks ... Browse Code »

We have full flexibility of merging parameters now, so we can remove the
hooks that define back/front/request merge strategies. Nobody is using
them anymore.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-19 15:33:11 +0800
2985259b0 [PATCH] ->nr_sectors and ->hard_nr_sectors are not used for BLOCK_PC requests ... Browse Code »

It's a file system thing, for block requests the only size used in the
io paths is ->data_len as it is in bytes, not sectors.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-19 15:27:31 +0800

13 Dec, 2006

1 commit

c65fb61b3 [PATCH] Allow as-iosched to be unloaded ... Browse Code »

We implemented the missing bits to allow this some time ago, and
they are integrated in AS. So remove the __module_get() to allow
the module to be unloaded.

Signed-off-by: Jens Axboe

Jens Axboe
2006-12-13 20:25:18 +0800