Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

24 May, 2007

2 commits

8ce7ad7b2 genhd: send async notification on media change ... Browse Code »

Send an uevent to user space to indicate that a media change event has
occurred.

Signed-off-by: Kristen Carlson Accardi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kristen Carlson Accardi
2007-05-24 11:14:12 +0800
86ce18d7b genhd: expose AN to user space ... Browse Code »

Allow user space to determine if a disk supports Asynchronous Notification of
media changes. This is done by adding a new sysfs file "capability_flags",
which is documented in (insert file name). This sysfs file will export all
disk capabilities flags to user space. We also define a new flag to define
the media change notification capability.

Signed-off-by: Kristen Carlson Accardi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kristen Carlson Accardi
2007-05-24 11:14:11 +0800

16 May, 2007

1 commit

f653c34dd ll_rw_blk: fix gcc 4.2 warning on current_io_context() ... Browse Code »

current_io_context() is both static and exported with EXPORT_SYMBOL().
As there are no users outside of ll_rw_blk.c itself, just kill the
export.

Problem reported by Martin Michlmayr

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-05-16 01:44:15 +0800

11 May, 2007

1 commit

d89d87965 When stacked block devices are in-use (e.g. md or dm), the recursive calls ... Browse Code »

to generic_make_request can use up a lot of space, and we would rather they
didn't.

As generic_make_request is a void function, and as it is generally not
expected that it will have any effect immediately, it is safe to delay any
call to generic_make_request until there is sufficient stack space
available.

As ->bi_next is reserved for the driver to use, it can have no valid value
when generic_make_request is called, and as __make_request implicitly
assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
certain that all callers set it to NULL. We can therefore safely use
bi_next to link pending requests together, providing we clear it before
making the real call.

So, we choose to allow each thread to only be active in one
generic_make_request at a time. If a subsequent (recursive) call is made,
the bio is linked into a per-thread list, and is handled when the active
call completes.

As the list of pending bios is per-thread, there are no locking issues to
worry about.

I say above that it is "safe to delay any call...". There are, however,
some behaviours of a make_request_fn which would make it unsafe. These
include any behaviour that assumes anything will have changed after a
recursive call to generic_make_request.

These could include:
- waiting for that call to finish and call it's bi_end_io function.
md use to sometimes do this (marking the superblock dirty before
completing a write) but doesn't any more
- inspecting the bio for fields that generic_make_request might
change, such as bi_sector or bi_bdev. It is hard to see a good
reason for this, and I don't think anyone actually does it.
- inspecing the queue to see if, e.g. it is 'full' yet. Again, I
think this is very unlikely to be useful, or to be done.

Signed-off-by: Neil Brown
Cc: Jens Axboe
Cc:

Alasdair G Kergon said:

I can see nothing wrong with this in principle.

For device-mapper at the moment though it's essential that, while the bio
mappings may now get delayed, they still get processed in exactly
the same order as they were passed to generic_make_request().

My main concern is whether the timing changes implicit in this patch
will make the rare data-corrupting races in the existing snapshot code
more likely. (I'm working on a fix for these races, but the unfinished
patch is already several hundred lines long.)

It would be helpful if some people on this mailing list would test
this patch in various scenarios and report back.

Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Neil Brown
2007-05-11 19:28:37 +0800

10 May, 2007

5 commits

9a9136e27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...

Fix trivial comment conflict in kernel/relay.c.

Linus Torvalds
2007-05-10 03:54:17 +0800
8bb784428 Add suspend-related notifications for CPU hotplug ... Browse Code »

Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-05-10 03:30:56 +0800
28e53bddf unify flush_work/flush_work_keventd and rename it to cancel_work_sync ... Browse Code »

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this). So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov
Cc: Jeff Garzik
Cc: "David S. Miller"
Cc: Jens Axboe
Cc: Tejun Heo
Cc: Auke Kok ,
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-05-10 03:30:53 +0800
19a75d83f kblockd: use flush_work ... Browse Code »

Switch the kblockd flushing from a global flush to a more specific
flush_work().

(akpm: bypassed maintainers, sorry. There are other patches which depend on
this)

Cc: "Maciej W. Rozycki"
Cc: David Howells
Cc: Jens Axboe
Cc: Nick Piggin
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-05-10 03:30:51 +0800
dd2a345f8 Display all possible partitions when the root filesystem failed to mount ... Browse Code »

Display all possible partitions when the root filesystem is not mounted.
This helps to track spell'o's and missing drivers.

Updated to work with newer kernels.

Example output:

VFS: Cannot open root device "foobar" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
0800 8388608 sda driver: sd
0801 192748 sda1
0802 8193150 sda2
0810 4194304 sdb driver: sd
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

[akpm@linux-foundation.org: cleanups, fix printk warnings]
Signed-off-by: Jan Engelhardt
Cc: Dave Gilbert
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Gilbert
2007-05-10 03:30:48 +0800

09 May, 2007

4 commits

59c51591a Fix occurrences of "the the " ... Browse Code »

Signed-off-by: Michael Opdenacker
Signed-off-by: Adrian Bunk

Michael Opdenacker
2007-05-09 14:57:56 +0800
02a93208e Merge branch 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block ... Browse Code »

* 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
[PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern()
[PATCH] splice: always call into page_cache_readahead()
[PATCH] splice(): fix interaction with readahead

Linus Torvalds
2007-05-09 02:34:52 +0800
c6a632a2b as: fix antic_expire check ... Browse Code »

Fix units mismatch (jiffies vs msecs) in as-iosched.c, spotted by Xiaoning
Ding .

Signed-off-by: Nick Piggin
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-05-09 02:15:03 +0800
821de3a27 [PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern() ... Browse Code »

I think we might just need the blk_map_kern users now. For the async
execute I added the bounce code already and the block SG_IO has it
atleady. I think the blk_map_kern bounce code got dropped because we
thought the correct gfp_t would be passed in. But I think all we need is
the patch below and all the paths are take care of. The patch is not
tested. Patch was made against scsi-misc.

The last place that is sending non sg commands may just be md/dm-emc.c
but that is is just waiting on alasdair to take some patches that fix
that and a bunch of junk in there including adding bounce support. If
the patch below is ok though and dm-emc finally gets converted then it
will have sg and bonce buffer support.

Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe

Mike Christie
2007-05-09 01:12:23 +0800

08 May, 2007

2 commits

0a31bd5f2 KMEM_CACHE(): simplify slab cache creation ... Browse Code »

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:55 +0800
f98393a64 mm: remove destroy_dirty_buffers from invalidate_bdev() ... Browse Code »

Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).

find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev($[^,]*$,[^)]*)/invalidate_bdev(\1)/g' $file;
done

Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-05-08 03:12:55 +0800

06 May, 2007

1 commit

4f7a307dc Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (87 commits)
[SCSI] fusion: fix domain validation loops
[SCSI] qla2xxx: fix regression on sparc64
[SCSI] modalias for scsi devices
[SCSI] sg: cap reserved_size values at max_sectors
[SCSI] BusLogic: stop using check_region
[SCSI] tgt: fix rdma transfer bugs
[SCSI] aacraid: fix aacraid not finding device
[SCSI] aacraid: Correct SMC products in aacraid.txt
[SCSI] scsi_error.c: Add EH Start Unit retry
[SCSI] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test.
[SCSI] ipr: Driver version to 2.3.2
[SCSI] ipr: Faster sg list fetch
[SCSI] ipr: Return better qc_issue errors
[SCSI] ipr: Disrupt device error
[SCSI] ipr: Improve async error logging level control
[SCSI] ipr: PCI unblock config access fix
[SCSI] ipr: Fix for oops following SATA request sense
[SCSI] ipr: Log error for SAS dual path switch
[SCSI] ipr: Enable logging of debug error data for all devices
[SCSI] ipr: Add new PCI-E IDs to device table
...

Linus Torvalds
2007-05-06 04:30:44 +0800

03 May, 2007

1 commit

823bccfc4 remove "struct subsystem" as it is no longer needed ... Browse Code »

We need to work on cleaning up the relationship between kobjects, ksets and
ktypes. The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2007-05-03 09:57:59 +0800

30 Apr, 2007

20 commits

07e447080 Merge branch 'cfq' into for-linus Browse Code »

Jens Axboe
2007-04-30 15:09:27 +0800
2a12dcd71 [PATCH] elevator: elv_list_lock does not need irq disabling ... Browse Code »

It's never grabbed from irq context, so just make it plain spin_lock().

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:08:17 +0800
597bc485d cfq-iosched: speedup cic rb lookup ... Browse Code »

We often lookup the same queue many times in succession, so cache
the last looked up queue to avoid browsing the rbtree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
4e521c27e ll_rw_blk: add io_context private pointer ... Browse Code »

To be used by as/cfq as they see fit.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
91fac317a cfq-iosched: get rid of cfqq hash ... Browse Code »

cfq hash is no more necessary. We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue. In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.

Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree

Signed-off-by: Vasily Tarasov

Changes by me:

- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too

Signed-off-by: Jens Axboe

Vasily Tarasov
2007-04-30 15:01:23 +0800
cc1974797 cfq-iosched: tighten queue request overlap condition ... Browse Code »

For tagged devices, allow overlap of requests if the idle window
isn't enabled on the current active queue.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
3ed9a2965 cfq-iosched: improve sync vs async workloads ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:23 +0800
1be92f2fc cfq-iosched: never allow an async queue idling ... Browse Code »

We don't enable it by default, don't let it get enabled during
runtime.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
20e493a8d cfq-iosched: get rid of ->dispatch_slice ... Browse Code »

We can track it fairly accurately locally, let the slice handling
take care of the rest.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
6084cdda0 cfq-iosched: don't pass unused preemption variable around ... Browse Code »

We don't use it anymore in the slice expiry handling.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
edd75ffd9 cfq-iosched: get rid of ->cur_rr and ->cfq_list ... Browse Code »

It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
67e6b49e3 cfq-iosched: slice offset should take ioprio into account ... Browse Code »

Use the max_slice-cur_slice as the multipler for the insertion offset.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
498d3aa2b [PATCH] cfq-iosched: style cleanups and comments ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
67060e379 cfq-iosched: sort IDLE queues into the rbtree ... Browse Code »

Same treatment as the RT conversion, just put the sorted idle
branch at the end of the tree.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
0c534e0a4 cfq-iosched: sort RT queues into the rbtree ... Browse Code »

Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:22 +0800
cc09e2990 [PATCH] cfq-iosched: speed up rbtree handling ... Browse Code »

For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.

Also spotted in the CFS CPU scheduler code.

Improved by Alan D. Brunelle by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
d9e7620e6 cfq-iosched: rework the whole round-robin list concept ... Browse Code »

Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.

This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.

The diffstat speaks for itself.

cfq-iosched.c | 363 +++++++++++++++++---------------------------------
1 file changed, 125 insertions(+), 238 deletions(-)

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
1afba0451 cfq-iosched: minor updates ... Browse Code »

- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
a while to "think".

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
6d048f531 cfq-iosched: development update ... Browse Code »

- Implement logic for detecting cooperating processes, so we
choose the best available queue whenever possible.

- Improve residual slice time accounting.

- Remove dead code: we no longer see async requests coming in on
sync queues. That part was removed a long time ago. That means
that we can also remove the difference between cfq_cfqq_sync()
and cfq_cfqq_class_sync(), they are now indentical. And we can
kill the on_dispatch array, just make it a counter.

- Allow a process to go into the current list, if it hasn't been
serviced in this scheduler tick yet.

Possible future improvements including caching the cfqq lookup
in cfq_close_cooperator(), so we don't have to look it up twice.
cfq_get_best_queue() should just use that last decision instead
of doing it again.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800
1e3335de0 cfq-iosched: improve preemption for cooperating tasks ... Browse Code »

When testing the syslet async io approach, I discovered that CFQ
sometimes didn't perform as well as expected. cfq_should_preempt()
needs to better check for cooperating tasks, so fix that by allowing
preemption of an equal priority queue if the recently queued request
is as good a candidate for IO as the one we are currently waiting for.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:01:21 +0800

25 Apr, 2007

1 commit

5044eed48 cfq-iosched: fix alias + front merge bug ... Browse Code »

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of the resulting
oops is seen here:

http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.

Read the more complete analysis at:

http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-25 23:41:48 +0800

21 Apr, 2007

1 commit

a99380065 cfq-iosched: fix sequential write regression ... Browse Code »

We have a 10-15% performance regression for sequential writes on TCQ/NCQ
enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
reported by Valerie Clement and the Intel
testing folks. The regression is because of CFQ's now more aggressive
queue control, limiting the depth available to the device.

This patches fixes that regression by allowing a greater depth when only
one queue is busy. It has been tested to not impact sync-vs-async
workloads too much - we still do a lot better than 2.6.20.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2007-04-21 13:56:29 +0800

18 Apr, 2007

1 commit

44ec95425 [SCSI] sg: cap reserved_size values at max_sectors ... Browse Code »

This patch (as857) modifies the SG_GET_RESERVED_SIZE and
SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at
the device's request_queue's max_sectors value. This will permit
cdrecord to obtain a legal value for the maximum transfer length,
fixing Bugzilla #7026.

The patch also caps the initial reserved_size value. There's no
reason to have a reserved buffer larger than max_sectors, since it
would be impossible to use the extra space.

The corresponding ioctls in the block layer are modified similarly,
and the initial value for the reserved_size is set as large as
possible. This will effectively make it default to max_sectors.
Note that the actual value is meaningless anyway, since block devices
don't have a reserved buffer.

Finally, the BLKSECTGET ioctl is added to sg, so that there will be a
uniform way for users to determine the actual max_sectors value for
any raw SCSI transport.

Signed-off-by: Alan Stern
Acked-by: Jens Axboe
Acked-by: Douglas Gilbert
Signed-off-by: James Bottomley

Alan Stern
2007-04-18 06:09:56 +0800