Doug / smarc-fsl-linux-kernel | Embedian Git Server

28 Apr, 2009

7 commits

158dbda00 block: reorganize request fetching functions ... Browse Code »

Impact: code reorganization

elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation. They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.

Move the two functions to blk-core.c. This prepares for further
interface cleanup.

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:34 +0800
5efccd17c block: reorder request completion functions ... Browse Code »

Reorder request completion functions such that

* All request completion functions are located together.

* Functions which are used by only one caller is put right above the
caller.

* end_request() is put after other completion functions but before
blk_update_request().

This change is for completion function cleanup which will follow.

[ Impact: cleanup, code reorganization ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:34 +0800
10732f566 block: cleanup REQ_SOFTBARRIER usages ... Browse Code »

blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is
now only used in elevator proper and for discard requests.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:34 +0800
e4025f6c2 block: don't set REQ_NOMERGE unnecessarily ... Browse Code »

RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
mergeable. There is no reason to specify it superflously. It only
adds to confusion. Don't set REQ_NOMERGE for barriers and requests
with specific queueing directive. REQ_NOMERGE is now exclusively used
by the merging code.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:33 +0800
a7f557923 block: kill blk_start_queueing() ... Browse Code »

blk_start_queueing() is identical to __blk_run_queue() except that it
doesn't check for recursion. None of the current users depends on
blk_start_queueing() running request_fn directly. Replace usages of
blk_start_queueing() with [__]blk_run_queue() and kill it.

[ Impact: removal of mostly duplicate interface function ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:33 +0800
a538cd03b block: merge blk_invoke_request_fn() into __blk_run_queue() ... Browse Code »

__blk_run_queue wraps blk_invoke_request_fn() such that it
additionally removes plug and bails out early if the queue is empty.
Both extra operations have their own pending mechanisms and don't
cause any harm correctness-wise when they are done superflously.

The only user of blk_invoke_request_fn() being blk_start_queue(),
there isn't much reason to keep both functions around. Merge
blk_invoke_request_fn() into __blk_run_queue() and make
blk_start_queue() use __blk_run_queue() instead.

[ Impact: merge two subtly different internal functions ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:33 +0800
924cec778 block: clear req->errors on bio completion only for fs requests ... Browse Code »

Impact: subtle behavior change

For fs requests, rq is only carrier of bios and rq error status as a
whole doesn't mean much. This is the reason why rq->errors is being
cleared on each partial completion of a request as on each partial
completion the error status is transferred to the respective bios.

For pc requests, rq->errors is used to carry error status to the
issuer and thus __end_that_request_first() doesn't clear it on such
cases.

The condition was fine till now as only fs and pc requests have used
bio and thus the bio completion path. However, future changes will
unify data accesses to bio and all non fs users care about rq error
status. Clear rq->errors on bio completion only for fs requests.

In general, the implicit clearing is a bit too subtle especially as
the meaning of rq->errors is completely dependent on low level
drivers. Unifying / cleaning up rq->errors usage and letting llds
manage it would be better. TODO comment added.

Signed-off-by: Tejun Heo
Acked-by: Jens Axboe

Tejun Heo
2009-04-28 13:37:28 +0800

24 Apr, 2009

1 commit

42dad7647 block: simplify I/O stat accounting ... Browse Code »

This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.

Signed-off-by: Jerome Marchand
Signed-off-by: Jens Axboe

Jerome Marchand
2009-04-24 14:54:21 +0800

08 Apr, 2009

1 commit

c93f216b5 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y
branch tracer: Fix for enabling branch profiling makes sparse unusable
ftrace: Correct a text align for event format output
Update /debug/tracing/README
tracing/ftrace: alloc the started cpumask for the trace file
tracing, x86: remove duplicated #include
ftrace: Add check of sched_stopped for probe_sched_wakeup
function-graph: add proper initialization for init task
tracing/ftrace: fix missing include string.h
tracing: fix incorrect return type of ns2usecs()
tracing: remove CALLER_ADDR2 from wakeup tracer
blktrace: fix pdu_len when tracing packet command requests
blktrace: small cleanup in blk_msg_write()
blktrace: NUL-terminate user space messages
tracing: move scripts/trace/power.pl to scripts/tracing/power.pl

Linus Torvalds
2009-04-08 05:10:10 +0800

07 Apr, 2009

2 commits

238532772 block: remove unused REQ_UNPLUG ... Browse Code »

The request inherits the unplug flag from the bio, but it isn't actually
used. The bio flag stops at __make_request(), which tells it to unplug
after submission. Passing it on to the request doesn't make any sense.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 14:59:11 +0800
26308eab6 block: fix inconsistency in I/O stat accounting code ... Browse Code »

This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.

Signed-off-by: Jerome Marchand
Signed-off-by: Jens Axboe

Jerome Marchand
2009-04-07 14:12:38 +0800

06 Apr, 2009

3 commits

aeb6fafb8 block: Add flag for telling the IO schedulers NOT to anticipate more IO ... Browse Code »

By default, CFQ will anticipate more IO from a given io context if the
previously completed IO was sync. This used to be fine, since the only
sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
being used now, we don't want to anticipate for those.

Add a bio/request flag that informs the IO scheduler that this is a sync
request that we should not idle for. Introduce WRITE_ODIRECT specifically
for O_DIRECT writes, and make sure that the other sync writes set this
flag.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2009-04-06 23:04:54 +0800
644b2d99b block: enabling plugging on SSD devices that don't do queuing ... Browse Code »

For the older SSD devices that don't do command queuing, we do want to
enable plugging to get better merging.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2009-04-06 23:04:54 +0800
1faa16d22 block: change the request allocation/congestion logic to be sync/async based ... Browse Code »

This makes sure that we never wait on async IO for sync requests, instead
of doing the split on writes vs reads.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2009-04-06 23:04:53 +0800

03 Apr, 2009

1 commit

e2494e1b4 blktrace: fix pdu_len when tracing packet command requests ... Browse Code »

Impact: output all of packet commands - not just the first 4 / 8 bytes

Since commit d7e3c3249ef23b4617393c69fe464765b4ff1645 ("block: add
large command support"), struct request->cmd has been changed from
unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.

v1 -> v2: by: FUJITA Tomonori

- make sure rq->cmd_len is always intialized, and then we can use
rq->cmd_len instead of BLK_MAX_CDB.

Signed-off-by: Li Zefan
Acked-by: FUJITA Tomonori
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Jens Axboe
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2009-04-03 21:29:26 +0800

26 Mar, 2009

1 commit

1cd96c242 block: WARN in __blk_put_request() for potential bio leak ... Browse Code »

Put a WARN_ON in __blk_put_request if it is about to
leak bio(s). This is a serious bug that can happen in error
handling code paths.

For this to work I have fixed a couple of places in block/ where
request->bio != NULL ownership was not honored. And a small cleanup
at sg_io() while at it.

Signed-off-by: Boaz Harrosh
Signed-off-by: Jens Axboe

Boaz Harrosh
2009-03-26 18:01:23 +0800

24 Mar, 2009

2 commits

50e174931 block: get rid of unused blkdev_free_rq() define ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2009-03-24 19:35:16 +0800
f3b144aa7 block: remove various blk_queue_*() setting functions in blk_init_queue_node() ... Browse Code »

It calls blk_queue_make_request(), which sets the identical set of limits.

Signed-off-by: Jens Axboe

Jens Axboe
2009-03-24 19:35:16 +0800

02 Feb, 2009

1 commit

fb8ec18c3 block: fix oops in blk_queue_io_stat() ... Browse Code »

Some initial probe requests don't have disk->queue mapped yet, so we
can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in
blk_do_io_stat().

Signed-off-by: Jens Axboe

Jens Axboe
2009-02-02 15:42:32 +0800

30 Jan, 2009

3 commits

bc58ba946 block: add sysfs file for controlling io stats accounting ... Browse Code »

This allows us to turn off disk stat accounting completely, for the cases
where the 0.5-1% reduction in system time is important.

Signed-off-by: Jens Axboe

Jens Axboe
2009-01-30 19:34:38 +0800
cec0707e4 block: silently error an unsupported barrier bio ... Browse Code »

This fixes a "regression" from 2.6.28, where the barrier probes that file
systems may do would trigger additional end request warnings in dmesg.

Signed-off-by: Jens Axboe

Jens Axboe
2009-01-30 19:34:37 +0800
213d9417f block: seperate bio/request unplug and sync bits ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2009-01-30 19:34:37 +0800

29 Dec, 2008

5 commits

a31a97381 block: don't use plugging on SSD devices ... Browse Code »

We just want to hand the first bits of IO to the device as fast
as possible. Gains a few percent on the IOPS rate.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:28:45 +0800
a7384677b block: remove duplicate or unused barrier/discard error paths ... Browse Code »

* Because barrier mode can be changed dynamically, whether barrier is
supported or not can be determined only when actually issuing the
barrier and there is no point in checking it earlier. Drop barrier
support check in generic_make_request() and __make_request(), and
update comment around the support check in blk_do_ordered().

* There is no reason to check discard support in both
generic_make_request() and __make_request(). Drop the check in
__make_request(). While at it, move error action block to the end
of the function and add unlikely() to q existence test.

* Barrier request, be it empty or not, is never passed to low level
driver and thus it's meaningless to try to copy back req->sector to
bio->bi_sector on error. In addition, the notion of failed sector
doesn't make any sense for empty barrier to begin with. Drop the
code block from __end_that_request_first().

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2008-12-29 15:28:44 +0800
64d01dc9e block: use cancel_work_sync() instead of kblockd_flush_work() ... Browse Code »

After many improvements on kblockd_flush_work, it is now identical to
cancel_work_sync, so a direct call to cancel_work_sync is suggested.

The only difference is that cancel_work_sync is a GPL symbol,
so no non-GPL modules anymore.

Signed-off-by: Cheng Renquan
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Cheng Renquan
2008-12-29 15:28:44 +0800
08bafc034 block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set ... Browse Code »

Allow the scsi request REQ_QUIET flag to be propagated to the buffer
file system layer. The basic ideas is to pass the flag from the scsi
request to the bio (block IO) and then to the buffer layer. The buffer
layer can then suppress needless printks.

This patch declutters the kernel log by removed the 40-50 (per lun)
buffer io error messages seen during a boot in my multipath setup . It
is a good chance any real errors will be missed in the "noise" it the
logs without this patch.

During boot I see blocks of messages like
"
__ratelimit: 211 callbacks suppressed
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242847
Buffer I/O error on device sdm, logical block 1
Buffer I/O error on device sdm, logical block 5242878
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242872
"
in my logs.

My disk environment is multipath fiber channel using the SCSI_DH_RDAC
code and multipathd. This topology includes an "active" and "ghost"
path for each lun. IO's to the "ghost" path will never complete and the
SCSI layer, via the scsi device handler rdac code, quick returns the IOs
to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
layer messages.

I am wanting to extend the QUIET behavior to include the buffer file
system layer to deal with these errors as well. I have been running this
patch for a while now on several boxes without issue. A few runs of
bonnie++ show no noticeable difference in performance in my setup.

Thanks for John Stultz for the quiet_error finalization.

Submitted-by: Keith Mannthey
Signed-off-by: Jens Axboe

Keith Mannthey
2008-12-29 15:28:44 +0800
70ed28b92 block: leave the request timeout timer running even on an empty list ... Browse Code »

For sync IO, we'll often do them serialized. This means we'll be touching
the queue timer for every IO, as opposed to only occasionally like we
do for queued IO. Instead of deleting the timer when the last request
is removed, just let continue running. If a new request comes up soon
we then don't have to readd the timer again. If no new requests arrive,
the timer will expire without side effect later.

This improves high iops sync IO by ~1%.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:28:42 +0800

05 Dec, 2008

1 commit

970987beb Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/ur… ... Browse Code »

…gent' into tracing/core

Ingo Molnar
2008-12-05 21:45:22 +0800

03 Dec, 2008

2 commits

0e435ac26 block: fix setting of max_segment_size and seg_boundary mask ... Browse Code »

Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
devices.

When stacking devices (LVM over MD over SCSI) some of the request queue
parameters are not set up correctly in some cases by default, namely
max_segment_size and and seg_boundary mask.

If you create MD device over SCSI, these attributes are zeroed.

Problem become when there is over this mapping next device-mapper mapping
- queue attributes are set in DM this way:

request_queue max_segment_size seg_boundary_mask
SCSI 65536 0xffffffff
MD RAID1 0 0
LVM 65536 -1 (64bit)

Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of
physical segments according to these parameters.

During the generic_make_request() is segment cout recalculated and can
increase bio->bi_phys_segments count over the allowed limit. (After
bio_clone() in stack operation.)

Thi is specially problem in CCISS driver, where it produce OOPS here

BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);

(MAXSEGENTRIES is 31 by default.)

Sometimes even this command is enough to cause oops:

dd iflag=direct if=/dev// of=/dev/null bs=128000 count=10

This command generates bios with 250 sectors, allocated in 32 4k-pages
(last page uses only 1024 bytes).

For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
unfortunatelly on lower layer it is recalculated to 32 segments and this
violates CCISS restriction and triggers BUG_ON().

The patch tries to fix it by:

* initializing attributes above in queue request constructor
blk_queue_make_request()

* make sure that blk_queue_stack_limits() inherits setting

(DM uses its own function to set the limits because it
blk_queue_stack_limits() was introduced later. It should probably switch
to use generic stack limit function too.)

* sets the default seg_boundary value in one place (blkdev.h)

* use this mask as default in DM (instead of -1, which differs in 64bit)

Bugs related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=471639
http://bugzilla.kernel.org/show_bug.cgi?id=8672

Signed-off-by: Milan Broz
Reviewed-by: Alasdair G Kergon
Cc: Neil Brown
Cc: FUJITA Tomonori
Cc: Tejun Heo
Cc: Mike Miller
Signed-off-by: Jens Axboe

Milan Broz
2008-12-03 19:55:55 +0800
53a08807c block: internal dequeue shouldn't start timer ... Browse Code »

blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
both start the timeout timer. Barrier code dequeues the original
barrier request but doesn't passes the request itself to lower level
driver, only broken down proxy requests; however, as the original
barrier code goes through the same dequeue path and timeout timer is
started on it. If barrier sequence takes long enough, this timer
expires but the low level driver has no idea about this request and
oops follows.

Timeout timer shouldn't have been started on the original barrier
request as it never goes through actual IO. This patch unexports
elv_dequeue_request(), which has no external user anyway, and makes it
operate on elevator proper w/o adding the timer and make
blkdev_dequeue_request() call elv_dequeue_request() and add timer.
Internal users which don't pass the request to driver - barrier code
and end_that_request_last() - are converted to use
elv_dequeue_request().

Signed-off-by: Tejun Heo
Cc: Mike Anderson
Signed-off-by: Jens Axboe

Tejun Heo
2008-12-03 19:41:26 +0800

26 Nov, 2008

2 commits

0bfc24559 blktrace: port to tracepoints, update ... Browse Code »

Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
sites. Spread them out to the usage sites, as suggested by
Mathieu Desnoyers.

Signed-off-by: Ingo Molnar
Acked-by: Mathieu Desnoyers

Ingo Molnar
2008-11-26 20:04:35 +0800
5f3ea37c7 blktrace: port to tracepoints ... Browse Code »

This was a forward port of work done by Mathieu Desnoyers, I changed it to
encode the 'what' parameter on the tracepoint name, so that one can register
interest in specific events and not on classes of events to then check the
'what' parameter.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jens Axboe
Signed-off-by: Ingo Molnar

Arnaldo Carvalho de Melo
2008-11-26 19:13:34 +0800

06 Nov, 2008

1 commit

e78042e5b blk: move blk_delete_timer call in end_that_request_last ... Browse Code »

Move the calling blk_delete_timer to later in end_that_request_last to
address an issue where blkdev_dequeue_request may have add a timer for the
request.

Signed-off-by: Mike Anderson
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Mike Anderson
2008-11-06 15:41:56 +0800

18 Oct, 2008

1 commit

c53dbf548 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: remove __generic_unplug_device() from exports
block: move q->unplug_work initialization
blktrace: pass zfcp driver data
blktrace: add support for driver data
block: fix current kernel-doc warnings
block: only call ->request_fn when the queue is not stopped
block: simplify string handling in elv_iosched_store()
block: fix kernel-doc for blk_alloc_devt()
block: fix nr_phys_segments miscalculation bug
block: add partition attribute for partition number
block: add BIG FAT WARNING to CONFIG_DEBUG_BLOCK_EXT_DEVT
softirq: Add support for triggering softirq work on softirqs.

Linus Torvalds
2008-10-18 00:29:55 +0800

17 Oct, 2008

4 commits

f73e2d13a block: remove __generic_unplug_device() from exports ... Browse Code »

The only out-of-core user is IDE, and that should be using
blk_start_queueing() instead.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-17 20:03:08 +0800
713ada9ba block: move q->unplug_work initialization ... Browse Code »

modprobe loop; rmmod loop effectively creates a blk_queue and destroys it
which results in q->unplug_work being canceled without it ever being
initialized.

Therefore, move the initialization of q->unplug_work from
blk_queue_make_request() to blk_alloc_queue*().

Reported-by: Alexey Dobriyan
Signed-off-by: Peter Zijlstra
Signed-off-by: Jens Axboe

Peter Zijlstra
2008-10-17 14:46:57 +0800
496aa8a98 block: fix current kernel-doc warnings ... Browse Code »

Fix block kernel-doc warnings:

Warning(linux-2.6.27-git4//fs/block_dev.c:1272): No description found for parameter 'path'
Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'cpu'
Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'part'
Warning(/var/linsrc/linux-2.6.27-git4//block/genhd.c:544): No description found for parameter 'partno'

Signed-off-by: Randy Dunlap
Signed-off-by: Jens Axboe

Randy Dunlap
2008-10-17 14:46:57 +0800
80a4b58e3 block: only call ->request_fn when the queue is not stopped ... Browse Code »

Callers should use either blk_run_queue/__blk_run_queue, or
blk_start_queueing() to invoke request handling instead of calling
->request_fn() directly as that does not take the queue stopped
flag into account.

Also add appropriate comments on the above functions to detail
their usage.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-17 14:46:57 +0800

13 Oct, 2008

1 commit

6000a368c [SCSI] block: separate failfast into multiple bits. ... Browse Code »

Multipath is best at handling transport errors. If it gets a device
error then there is not much the multipath layer can do. It will just
access the same device but from a different path.

This patch breaks up failfast into device, transport and driver errors.
The multipath layers (md and dm mutlipath) only ask the lower levels to
fast fail transport errors. The user of failfast, read ahead, will ask
to fast fail on all errors.

Note that blk_noretry_request will return true if any failfast bit
is set. This allows drivers that do not support the multipath failfast
bits to continue to fail on any failfast error like before. Drivers
like scsi that are able to fail fast specific errors can check
for the specific fail fast type. In the next patch I will convert
scsi.

Signed-off-by: Mike Christie
Cc: Jens Axboe
Signed-off-by: James Bottomley

Mike Christie
2008-10-13 21:28:52 +0800

09 Oct, 2008

1 commit

d00e29fd9 block: remove end_{queued|dequeued}_request() ... Browse Code »

This patch removes end_queued_request() and end_dequeued_request(),
which are no longer used.

As a results, users of __end_request() became only end_request().
So the actual code in __end_request() is moved to end_request()
and __end_request() is removed.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Jens Axboe

Kiyoshi Ueda
2008-10-09 14:56:21 +0800