Eric Lee / smarc-fsl-linux-kernel

14 Sep, 2009

1 commit

a9327cac4 Seperate read and write statistics of in_flight requests ... Browse Code »

Currently, there is a single in_flight counter measuring the number of
requests in the request_queue. But some monitoring tools would like to
know how many read requests and write requests are in progress. Split the
current in_flight counter into two seperate counters for read and write.

This information is exported as a sysfs attribute, as changing the
currently available stat files would break the existing tools.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Jens Axboe

Nikanth Karthikesan
2009-09-14 14:24:52 +0800

11 Sep, 2009

1 commit

1f98a13f6 bio: first step in sanitizing the bio->bi_rw flag testing ... Browse Code »

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 20:33:31 +0800

05 Sep, 2009

1 commit

a77e28c7e dm multipath: fix oops when request based io fails when no paths ... Browse Code »

The patch posted at http://marc.info/?l=dm-devel&m=124539787228784&w=2
which was merged into cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 ("dm:
prepare for request based option") introduced a regression in
request-based dm.

If map_request() calls dm_kill_unmapped_request() to complete a cloned
bio without dispatching it, clone->bio is still set when
dm_end_request() is called and the BUG_ON(clone->bio) is incorrect.

The patch fixes this bug by freeing bio in dm_end_request() if the clone
has bio. I've redone my tests to cover all I/O paths and confirmed
there's no other regression.

Here is the oops I hit in request-based dm when I do I/O to a multipath
device which doesn't have any active path nor queue_if_no_path setting:

------------[ cut here ]------------
kernel BUG at /root/2.6.31-rc4.rqdm/drivers/md/dm.c:828!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 1
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac sg sr_mod e1000e button cdrom serio_raw rtc_cmos rtc_core rtc_lib piix lpfc scsi_transport_fc ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31-rc4.rqdm #1 Express5800/120Lj [N8100-1417]
RIP: 0010:[] [] dm_softirq_done+0xbd/0x100 [dm_mod]
RSP: 0018:ffff8800280a1f08 EFLAGS: 00010282
RAX: ffffffffa02544e0 RBX: ffff8802aa1111d0 RCX: ffff8802aa1111e0
RDX: ffff8802ab913e70 RSI: 0000000000000000 RDI: ffff8802ab913e70
RBP: ffff8800280a1f28 R08: ffffc90005457040 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffb
R13: ffff8802ab913e88 R14: ffff8802ab9c1438 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff88002809e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003d54a98640 CR3: 000000029f0a1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/1 (pid: 7, threadinfo ffff8802ae50e000, task ffff8802ae4f8040)
Stack:
ffff8800280a1f38 0000000000000020 ffffffff814f30a0 0000000000000004
ffff8800280a1f58 ffffffff8116b245 ffff8800280a1f38 ffff8800280a1f38
ffff8800280a1f58 0000000000000001 ffff8800280a1fa8 ffffffff810477bc
Call Trace:

[] blk_done_softirq+0x75/0x90
[] __do_softirq+0xcc/0x210
[] ? ksoftirqd+0x0/0x110
[] call_softirq+0x1c/0x50

[] do_softirq+0x65/0xa0
[] ? ksoftirqd+0x0/0x110
[] ksoftirqd+0x70/0x110
[] kthread+0x99/0xb0
[] child_rip+0xa/0x20
[] ? restore_args+0x0/0x30
[] ? kthread+0x0/0xb0
[] ? child_rip+0x0/0x20
Code: 44 89 e6 48 89 df e8 23 fb f2 e0 be 01 00 00 00 4c 89 f7 e8 f6 fd ff ff 5b 41 5c 41 5d 41 5e c9 c3 4c 89 ef e8 85 fe ff ff eb ed 0b eb fe 41 8b 85 dc 00 00 00 48 83 bb 10 01 00 00 00 89 83
RIP [] dm_softirq_done+0xbd/0x100 [dm_mod]
RSP
---[ end trace 16af0a1d8542da55 ]---

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-09-05 03:40:16 +0800

24 Jul, 2009

1 commit

a732c207d dm: remove queue next_ordered workaround for barriers ... Browse Code »

This patch removes DM's bio-based vs request-based conditional setting
of next_ordered. For bio-based DM the next_ordered check is no longer a
concern (as that check is now in the __make_request path). For
request-based DM the default of QUEUE_ORDERED_NONE is now appropriate.

bio-based DM was changed to work-around the previously misplaced
next_ordered check with this commit:
99360b4c18f7675b50d283301d46d755affe75fd

request-based DM does not yet support barriers but reacted to the above
bio-based DM change with this commit:
5d67aa2366ccb8257d103d0b43df855605c3c086

The above changes are no longer needed given Neil Brown's recent fix to
put the next_ordered check in the __make_request path:
db64f680ba4b5c56c4be59f0698000df89ff0281

Signed-off-by: Mike Snitzer
Cc: Jun'ichi Nomura
Cc: NeilBrown
Acked-by: Kiyoshi Ueda
Acked-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-07-24 03:30:40 +0800

01 Jul, 2009

1 commit

7878cba9f block: Create bip slabs with embedded integrity vectors ... Browse Code »

This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs. Each bip slab
has an embedded bio_vec array at the end. This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version. Only the largest bip slab is backed by a mempool. The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2009-07-01 16:56:25 +0800

22 Jun, 2009

18 commits

523d9297d dm: disable interrupt when taking map_lock ... Browse Code »

This patch disables interrupt when taking map_lock to avoid
lockdep warnings in request-based dm.

request-based dm takes map_lock after taking queue_lock with
disabling interrupt:
spin_lock_irqsave(queue_lock)
q->request_fn() == dm_request_fn()
=> dm_get_table()
=> read_lock(map_lock)
while queue_lock could be (but isn't) taken in interrupt context.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Acked-by: Christof Schmitt
Acked-by: Hannes Reinecke
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:37 +0800
5d67aa236 dm: do not set QUEUE_ORDERED_DRAIN if request based ... Browse Code »

Request-based dm doesn't have barrier support yet.
So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
Since the device type is decided at the first table loading time,
the flag set is deferred until then.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Acked-by: Hannes Reinecke
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:36 +0800
e6ee8c0b7 dm: enable request based option ... Browse Code »

This patch enables request-based dm.

o Request-based dm and bio-based dm coexist, since there are
some target drivers which are more fitting to bio-based dm.
Also, there are other bio-based devices in the kernel
(e.g. md, loop).
Since bio-based device can't receive struct request,
there are some limitations on device stacking between
bio-based and request-based.

type of underlying device
bio-based request-based
----------------------------------------------
bio-based OK OK
request-based -- OK

The device type is recognized by the queue flag in the kernel,
so dm follows that.

o The type of a dm device is decided at the first table binding time.
Once the type of a dm device is decided, the type can't be changed.

o Mempool allocations are deferred to at the table loading time, since
mempools for request-based dm are different from those for bio-based
dm and needed mempool type is fixed by the type of table.

o Currently, request-based dm supports only tables that have a single
target. To support multiple targets, we need to support request
splitting or prevent bio/request from spanning multiple targets.
The former needs lots of changes in the block layer, and the latter
needs that all target drivers support merge() function.
Both will take a time.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:36 +0800
cec47e3d4 dm: prepare for request based option ... Browse Code »

This patch adds core functions for request-based dm.

When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
make_request_fn: dm_make_request()
pref_fn: dm_prep_fn()
request_fn: dm_request_fn()
softirq_done_fn: dm_softirq_done()
lld_busy_fn: dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).

Below is a brief summary of how request-based dm behaves, including:
- making request from bio
- cloning, mapping and dispatching request
- completing request and bio
- suspending md
- resuming md

bio to request
==============
md->queue->make_request_fn() (dm_make_request()) calls __make_request()
for a bio submitted to the md.
Then, the bio is kept in the queue as a new request or merged into
another request in the queue if possible.

Cloning and Mapping
===================
Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
when requests are dispatched after they are sorted by the I/O scheduler.

dm_request_fn() checks busy state of underlying devices using
target's busy() function and stops dispatching requests to keep them
on the dm device's queue if busy.
It helps better I/O merging, since no merge is done for a request
once it is dispatched to underlying devices.

Actual cloning and mapping are done in dm_prep_fn() and map_request()
called from dm_request_fn().
dm_prep_fn() clones not only request but also bios of the request
so that dm can hold bio completion in error cases and prevent
the bio submitter from noticing the error.
(See the "Completion" section below for details.)

After the cloning, the clone is mapped by target's map_rq() function
and inserted to underlying device's queue using
blk_insert_cloned_request().

Completion
==========
Request completion can be hooked by rq->end_io(), but then, all bios
in the request will have been completed even error cases, and the bio
submitter will have noticed the error.
To prevent the bio completion in error cases, request-based dm clones
both bio and request and hooks both bio->bi_end_io() and rq->end_io():
bio->bi_end_io(): end_clone_bio()
rq->end_io(): end_clone_request()

Summary of the request completion flow is below:
blk_end_request() for a clone request
=> blk_update_request()
=> bio->bi_end_io() == end_clone_bio() for each clone bio
=> Free the clone bio
=> Success: Complete the original bio (blk_update_request())
Error: Don't complete the original bio
=> blk_finish_request()
=> rq->end_io() == end_clone_request()
=> blk_complete_request()
=> dm_softirq_done()
=> Free the clone request
=> Success: Complete the original request (blk_end_request())
Error: Requeue the original request

end_clone_bio() completes the original request on the size of
the original bio in successful cases.
Even if all bios in the original request are completed by that
completion, the original request must not be completed yet to keep
the ordering of request completion for the stacking.
So end_clone_bio() uses blk_update_request() instead of
blk_end_request().
In error cases, end_clone_bio() doesn't complete the original bio.
It just frees the cloned bio and gives over the error handling to
end_clone_request().

end_clone_request(), which is called with queue lock held, completes
the clone request and the original request in a softirq context
(dm_softirq_done()), which has no queue lock, to avoid a deadlock
issue on submission of another request during the completion:
- The submitted request may be mapped to the same device
- Request submission requires queue lock, but the queue lock
has been held by itself and it doesn't know that

The clone request has no clone bio when dm_softirq_done() is called.
So target drivers can't resubmit it again even error cases.
Instead, they can ask dm core for requeueing and remapping
the original request in that cases.

suspend
=======
Request-based dm uses stopping md->queue as suspend of the md.
For noflush suspend, just stops md->queue.

For flush suspend, inserts a marker request to the tail of md->queue.
And dispatches all requests in md->queue until the marker comes to
the front of md->queue. Then, stops dispatching request and waits
for the all dispatched requests to complete.
After that, completes the marker request, stops md->queue and
wake up the waiter on the suspend queue, md->wait.

resume
======
Starts md->queue.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:35 +0800
754c5fc7e dm: calculate queue limits during resume not load ... Browse Code »

Currently, device-mapper maintains a separate instance of 'struct
queue_limits' for each table of each device. When the configuration of
a device is to be changed, first its table is loaded and this structure
is populated, then the device is 'resumed' and the calculated
queue_limits are applied.

This places restrictions on how userspace may process related devices,
where it is often advantageous to 'load' tables for several devices
at once before 'resuming' them together. As the new queue_limits
only take effect after the 'resume', if they are changing and one
device uses another, the latter must be 'resumed' before the former
may be 'loaded'.

This patch moves the calculation of these queue_limits out of
the 'load' operation into 'resume'. Since we are no longer
pre-calculating this struct, we no longer need to maintain copies
within our dm structs.

dm_set_device_limits() now passes the 'start' of the device's
data area (aka pe_start) as the 'offset' to blk_stack_limits().

init_valid_queue_limits() is replaced by blk_set_default_limits().

Signed-off-by: Mike Snitzer
Cc: martin.petersen@oracle.com
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:34 +0800
60935eb21 dm ioctl: support cookies for udev ... Browse Code »

Add support for passing a 32 bit "cookie" into the kernel with the
DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned)
value of this cookie is returned to userspace alongside the uevents
issued by these ioctls in the variable DM_COOKIE.

This means the userspace process issuing these ioctls can be notified
by udev after udev has completed any actions triggered.

To minimise the interface extension, we pass the cookie into the
kernel in the event_nr field which is otherwise unused when calling
these ioctls. Incrementing the version number allows userspace to
determine in advance whether or not the kernel supports the cookie.
If the kernel does support this but userspace does not, there should
be no impact as the new variable will just get ignored.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-06-22 17:12:30 +0800
52b1fd5a2 dm: send empty barriers to targets in dm_flush ... Browse Code »

Pass empty barrier flushes to the targets in dm_flush().

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:21 +0800
9015df24a dm: initialise tio in alloc_tio ... Browse Code »

Move repeated dm_target_io initialisation inside alloc_tio().

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-06-22 17:12:21 +0800
f9ab94cee dm: introduce num_flush_requests ... Browse Code »

Introduce num_flush_requests for a target to set to say how many flush
instructions (empty barriers) it wants to receive. These are sent by
__clone_and_map_empty_barrier with map_info->flush_request going from 0
to (num_flush_requests - 1).

Old targets without flush support won't receive any flush requests.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:20 +0800
27eaa1497 dm: remove check that prevents mapping empty bios ... Browse Code »

Remove the check that the size of the cloned bio is not zero because a
subsequent patch needs to send zero-sized barriers down this path.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:20 +0800
fdb9572b7 dm: remove EOPNOTSUPP for barriers ... Browse Code »

If the underlying device doesn't support barriers and dm receives a
barrier, it waits until all requests on that device drain so it no
longer needs to report -EOPNOTSUPP to the caller.

This patch deals with the confusing situation when moving a volume from
one physical device to another triggers an EOPNOTSUPP on a volume that
didn't report it before.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:19 +0800
5aa2781d9 dm: store only first barrier error ... Browse Code »

With the following patches, more than one error can occur during
processing. Change md->barrier_error so that only the first one is
recorded and returned to the caller.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:18 +0800
2761e95fe dm: process requeue in dm_wq_work ... Browse Code »

If barrier request was returned with DM_ENDIO_REQUEUE,
requeue it in dm_wq_work instead of dec_pending.

This allows us to correctly handle a situation when some targets
are asking for a requeue and other targets signal an error.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:18 +0800
531fe9636 dm: make dm_flush return void ... Browse Code »

Make dm_flush return void.

The first error during flush is stored in md->barrier_error instead.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:17 +0800
32a926da5 dm: always hold bdev reference ... Browse Code »

Fix a potential deadlock when creating multiple snapshots by holding a
reference to struct block_device for the whole lifecycle of every dm
device instead of obtaining it independently at each point it is needed.

bdget_disk() was called while the device was being suspended, in
dm_suspend(). However there could be other devices already suspended,
for example when creating additional snapshots of a device. bdget_disk()
can wait for IO and allocate memory resulting in waiting for the
already-suspended device - deadlock.

This patch changes the code so that it gets the reference to struct
block_device when struct mapped_device is allocated and initialized in
alloc_dev() where it is always OK to allocate memory or wait for I/O.
It drops the reference when it is destroyed in free_dev(). Thus there
is no call to bdget_disk() while any device is suspended.

Previously unlock_fs() was called only if bdev was held. Now it is
called unconditionally, but the superfluous calls are harmless because
it returns immediately if the filesystem was not previously frozen.

This patch also now allows the device size to be changed in a
noflush suspend because the bdev is held. This has no adverse effect.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:17 +0800
db8fef4fa dm: rename suspended_bdev to bdev ... Browse Code »

Rename suspended_bdev to bdev.

This patch doesn't change any functionality, just renames the variable.
In the next patch, the variable will be used even for non-suspended device.

(Pre-requisite for the per-target barrier support patches.)

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:15 +0800
8cbeb67ad dm: avoid unsupported spanning of md stripe boundaries ... Browse Code »

A bio that has two or more vector entries, size less than or equal to
page size, that crosses a stripe boundary of an underlying md device is
accepted by device mapper (it conforms to all its limits) but not by the
underlying device.

The fix is: If device mapper selects the one-page maximum request size,
it also needs to set its own q->merge_bvec_fn to reject any bios with
multiple vector entries that span more pages.

The problem was discovered in the following scenario:
* MD - RAID-0
* LV on the top of it (raid1, snapshot or striped with chunk
size/stripe larger than RAID-0 stripe)
* one of the logical volumes is exported to xen domU
* inside xen domU it is partitioned, the key point is that the partition
must be unaligned on page boundary (fdisk normally aligns the partition to
63 sectors which will trigger it)
* install the system on the partitioned disk in domU
This causes I/O failures in dom0.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=223947

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:14 +0800
4d89b7b4e dm: sysfs skip output when device is being destroyed ... Browse Code »

Do not process sysfs attributes when device is being destroyed.

Otherwise code can cause
BUG_ON(test_bit(DMF_FREEING, &md->flags));
in dm_put() call.

Cc: stable@kernel.org
Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-06-22 17:12:11 +0800

16 Jun, 2009

1 commit

e212d6f25 block: remove some includings of blktrace_api.h ... Browse Code »

When porting blktrace to tracepoints, we changed to trace/block.h
for trace prober declarations.

Signed-off-by: Li Zefan
Signed-off-by: Jens Axboe

Li Zefan
2009-06-16 17:19:36 +0800

10 Jun, 2009

1 commit

55782138e tracing/events: convert block trace points to TRACE_EVENT() ... Browse Code »

TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:

- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...

Cons:

- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

This is mainly because we can't get the deivce from a request queue.
But this may change in the future.

- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.

Since pc requests should be rather rare, this is not a big issue.

- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.

The overhead is minimized by using __dynamic_array() instead of __array().

I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.

And the binary output of TRACE_EVENT is much smaller than blktrace:

# ls -l -h
-rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
-rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
-rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

Following are some comparisons between TRACE_EVENT and blktrace:

plug:
kjournald-480 [000] 303.084981: block_plug: [kjournald]
kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

unplug_io:
kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

remap:
kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

- use the newly introduced __dynamic_array().

Changelog from v1 -> v2:

- use __string() instead of __array() to minimize the memory required
to store hex dump of rq->cmd().

- support large pc requests.

- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

- some cleanups.

Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2009-06-10 00:34:23 +0800

07 May, 2009

1 commit

44347d947 Merge branch 'linus' into tracing/core ... Browse Code »

Merge reason: tracing/core was on a .30-rc1 base and was missing out on
on a handful of tracing fixes present in .30-rc5-almost.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-05-07 17:17:34 +0800

06 May, 2009

1 commit

22a7c31a9 blktrace: from-sector redundant in trace_block_remap ... Browse Code »

Remove redundant from-sector parameter: it's /always/ the bio's sector
passed in.

[ Impact: cleanup ]

Signed-off-by: Alan D. Brunelle
Reviewed-by: Li Zefan
Reviewed-by: KOSAKI Motohiro
Cc: Jens Axboe
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Alan D. Brunelle
2009-05-06 20:13:01 +0800

15 Apr, 2009

1 commit

8f3d8ba20 block: move bio list helpers into bio.h ... Browse Code »

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.

Signed-off-by: Christoph Hellwig
Acked-by: Alasdair G Kergon
Signed-off-by: Jens Axboe

Christoph Hellwig
2009-04-15 14:28:09 +0800

09 Apr, 2009

8 commits

af7e466a1 dm: implement basic barrier support ... Browse Code »

Barriers are submitted to a worker thread that issues them in-order.

The thread is modified so that when it sees a barrier request it waits
for all pending IO before the request then submits the barrier and
waits for it. (We must wait, otherwise it could be intermixed with
following requests.)

Errors from the barrier request are recorded in a per-device barrier_error
variable. There may be only one barrier request in progress at once.

For now, the barrier request is converted to a non-barrier request when
sending it to the underlying device.

This patch guarantees correct barrier behavior if the underlying device
doesn't perform write-back caching. The same requirement existed before
barriers were supported in dm.

Bottom layer barrier support (sending barriers by target drivers) and
handling devices with write-back caches will be done in further patches.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:16 +0800
92c639021 dm: remove dm_request loop ... Browse Code »

Remove queue_io return value and a loop in dm_request.

IO may be submitted to a worker thread with queue_io(). queue_io() sets
DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When
the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this
point on, requests are submitted from dm_request again. This will be used
for processing barriers.

Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread
even if DMF_QUEUE_IO_TO_THREAD was not set.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:15 +0800
3b00b2036 dm: rework queueing and suspension ... Browse Code »

Rework shutting down on suspend and document the associated rules.

Drop write lock in __split_and_process_bio to allow more processing
concurrency.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:15 +0800
54d9a1b45 dm: simplify dm_request loop ... Browse Code »

Refactor the code in dm_request().

Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will
discard so we don't drop such bios while processing a barrier.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-04-09 07:27:14 +0800
1eb787ec1 dm: split DMF_BLOCK_IO flag into two ... Browse Code »

Split the DMF_BLOCK_IO flag into two.

DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a
device. DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a
worker thread for later processing.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-04-09 07:27:14 +0800
df12ee996 dm: rearrange dm_wq_work ... Browse Code »

Refactor dm_wq_work() to make later patch more readable.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-04-09 07:27:13 +0800
692d0eb9e dm: remove limited barrier support ... Browse Code »

Prepare for full barrier implementation: first remove the restricted support.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:13 +0800
9c47008d1 dm: add integrity support ... Browse Code »

This patch provides support for data integrity passthrough in the device
mapper.

- If one or more component devices support integrity an integrity
profile is preallocated for the DM device.

- If all component devices have compatible profiles the DM device is
flagged as capable.

- Handle integrity metadata when splitting and cloning bios.

Signed-off-by: Martin K. Petersen
Signed-off-by: Alasdair G Kergon

Martin K. Petersen
2009-04-09 07:27:12 +0800

03 Apr, 2009

4 commits

99360b4c1 dm: set queue ordered mode ... Browse Code »

Set queue ordered mode. It doesn't really matter what we set here
because we don't ever put any requests on the queue. But we need to set
something other than QUEUE_ORDERED_NONE so that __generic_make_request
passes barrier requests to us.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:39 +0800
b44ebeb01 dm: move wait queue declaration ... Browse Code »

Move wait queue declaration and unplug to dm_wait_for_completion.

The purpose is to minimize duplicate code in the further patches.

The patch reorders functions a little bit. It doesn't change any
functionality. For proper non-deadlock operation, add_wait_queue must
happen before set_current_state(interruptible) and before the test for
!atomic_read(&md->pending).

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:39 +0800
022c26110 dm: merge pushback and deferred bio lists ... Browse Code »

Merge pushback and deferred lists into one list - use deferred list
for both deferred and pushed-back bios.

This will be needed for proper support of barrier bios: it is impossible to
support ordering correctly with two lists because the requests on both lists
will be mixed up.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:39 +0800
401600dfd dm: allow uninterruptible wait for pending io ... Browse Code »

Allow uninterruptible wait for pending IOs.

Add argument "interruptible" to dm_wait_for_completion that specifies
either interruptible or uninterruptible waiting.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:38 +0800