Doug / smarc-fsl-linux-kernel | Embedian Git Server

16 Jun, 2009

1 commit

e212d6f25 block: remove some includings of blktrace_api.h ... Browse Code »

When porting blktrace to tracepoints, we changed to trace/block.h
for trace prober declarations.

Signed-off-by: Li Zefan
Signed-off-by: Jens Axboe

Li Zefan
2009-06-16 17:19:36 +0800

10 Jun, 2009

1 commit

55782138e tracing/events: convert block trace points to TRACE_EVENT() ... Browse Code »

TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:

- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...

Cons:

- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

This is mainly because we can't get the deivce from a request queue.
But this may change in the future.

- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.

Since pc requests should be rather rare, this is not a big issue.

- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.

The overhead is minimized by using __dynamic_array() instead of __array().

I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.

And the binary output of TRACE_EVENT is much smaller than blktrace:

# ls -l -h
-rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
-rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
-rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

Following are some comparisons between TRACE_EVENT and blktrace:

plug:
kjournald-480 [000] 303.084981: block_plug: [kjournald]
kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

unplug_io:
kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

remap:
kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

- use the newly introduced __dynamic_array().

Changelog from v1 -> v2:

- use __string() instead of __array() to minimize the memory required
to store hex dump of rq->cmd().

- support large pc requests.

- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

- some cleanups.

Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2009-06-10 00:34:23 +0800

07 May, 2009

1 commit

44347d947 Merge branch 'linus' into tracing/core ... Browse Code »

Merge reason: tracing/core was on a .30-rc1 base and was missing out on
on a handful of tracing fixes present in .30-rc5-almost.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-05-07 17:17:34 +0800

06 May, 2009

1 commit

22a7c31a9 blktrace: from-sector redundant in trace_block_remap ... Browse Code »

Remove redundant from-sector parameter: it's /always/ the bio's sector
passed in.

[ Impact: cleanup ]

Signed-off-by: Alan D. Brunelle
Reviewed-by: Li Zefan
Reviewed-by: KOSAKI Motohiro
Cc: Jens Axboe
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Alan D. Brunelle
2009-05-06 20:13:01 +0800

15 Apr, 2009

1 commit

8f3d8ba20 block: move bio list helpers into bio.h ... Browse Code »

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.

Signed-off-by: Christoph Hellwig
Acked-by: Alasdair G Kergon
Signed-off-by: Jens Axboe

Christoph Hellwig
2009-04-15 14:28:09 +0800

09 Apr, 2009

8 commits

af7e466a1 dm: implement basic barrier support ... Browse Code »

Barriers are submitted to a worker thread that issues them in-order.

The thread is modified so that when it sees a barrier request it waits
for all pending IO before the request then submits the barrier and
waits for it. (We must wait, otherwise it could be intermixed with
following requests.)

Errors from the barrier request are recorded in a per-device barrier_error
variable. There may be only one barrier request in progress at once.

For now, the barrier request is converted to a non-barrier request when
sending it to the underlying device.

This patch guarantees correct barrier behavior if the underlying device
doesn't perform write-back caching. The same requirement existed before
barriers were supported in dm.

Bottom layer barrier support (sending barriers by target drivers) and
handling devices with write-back caches will be done in further patches.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:16 +0800
92c639021 dm: remove dm_request loop ... Browse Code »

Remove queue_io return value and a loop in dm_request.

IO may be submitted to a worker thread with queue_io(). queue_io() sets
DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When
the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this
point on, requests are submitted from dm_request again. This will be used
for processing barriers.

Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread
even if DMF_QUEUE_IO_TO_THREAD was not set.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:15 +0800
3b00b2036 dm: rework queueing and suspension ... Browse Code »

Rework shutting down on suspend and document the associated rules.

Drop write lock in __split_and_process_bio to allow more processing
concurrency.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:15 +0800
54d9a1b45 dm: simplify dm_request loop ... Browse Code »

Refactor the code in dm_request().

Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will
discard so we don't drop such bios while processing a barrier.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-04-09 07:27:14 +0800
1eb787ec1 dm: split DMF_BLOCK_IO flag into two ... Browse Code »

Split the DMF_BLOCK_IO flag into two.

DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a
device. DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a
worker thread for later processing.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-04-09 07:27:14 +0800
df12ee996 dm: rearrange dm_wq_work ... Browse Code »

Refactor dm_wq_work() to make later patch more readable.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-04-09 07:27:13 +0800
692d0eb9e dm: remove limited barrier support ... Browse Code »

Prepare for full barrier implementation: first remove the restricted support.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:13 +0800
9c47008d1 dm: add integrity support ... Browse Code »

This patch provides support for data integrity passthrough in the device
mapper.

- If one or more component devices support integrity an integrity
profile is preallocated for the DM device.

- If all component devices have compatible profiles the DM device is
flagged as capable.

- Handle integrity metadata when splitting and cloning bios.

Signed-off-by: Martin K. Petersen
Signed-off-by: Alasdair G Kergon

Martin K. Petersen
2009-04-09 07:27:12 +0800

03 Apr, 2009

10 commits

99360b4c1 dm: set queue ordered mode ... Browse Code »

Set queue ordered mode. It doesn't really matter what we set here
because we don't ever put any requests on the queue. But we need to set
something other than QUEUE_ORDERED_NONE so that __generic_make_request
passes barrier requests to us.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:39 +0800
b44ebeb01 dm: move wait queue declaration ... Browse Code »

Move wait queue declaration and unplug to dm_wait_for_completion.

The purpose is to minimize duplicate code in the further patches.

The patch reorders functions a little bit. It doesn't change any
functionality. For proper non-deadlock operation, add_wait_queue must
happen before set_current_state(interruptible) and before the test for
!atomic_read(&md->pending).

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:39 +0800
022c26110 dm: merge pushback and deferred bio lists ... Browse Code »

Merge pushback and deferred lists into one list - use deferred list
for both deferred and pushed-back bios.

This will be needed for proper support of barrier bios: it is impossible to
support ordering correctly with two lists because the requests on both lists
will be mixed up.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:39 +0800
401600dfd dm: allow uninterruptible wait for pending io ... Browse Code »

Allow uninterruptible wait for pending IOs.

Add argument "interruptible" to dm_wait_for_completion that specifies
either interruptible or uninterruptible waiting.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:38 +0800
ef2085870 dm: merge __flush_deferred_io into caller ... Browse Code »

Merge __flush_deferred_io() into the only caller, dm_wq_work().

There's no need to have a function that has only one caller.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:38 +0800
f0b9a4502 dm: move bio_io_error into __split_and_process_bio ... Browse Code »

Move the bio_io_error() calls directly into __split_and_process_bio().

This avoids some code duplication in later patches.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:38 +0800
8a53c28db dm: rename __split_bio ... Browse Code »

Rename __split_bio() to __split_and_process_bio() because it not only splits
the bio to serveral parts, but also submits them to target drivers.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:37 +0800
53d5914f2 dm: remove unnecessary struct dm_wq_req ... Browse Code »

Remove struct dm_wq_req and move "work" directly into struct mapped_device.

In the revised implementation, the thread will do just one type of work
(processing the queue).

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:37 +0800
9a1fb4644 dm: remove unnecessary work queue context field ... Browse Code »

Remove the context field from struct dm_wq_req because we will no longer
need it.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:36 +0800
143773965 dm: remove unnecessary work queue type field ... Browse Code »

Remove "type" field from struct dm_wq_req because we no longer need it
to have more than one value.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:36 +0800

17 Mar, 2009

1 commit

b35f8caa0 dm crypt: wait for endio to complete before destruction ... Browse Code »

The following oops has been reported when dm-crypt runs over a loop device.

...
[ 70.381058] Process loop0 (pid: 4268, ti=cf3b2000 task=cf1cc1f0 task.ti=cf3b2000)
...
[ 70.381058] Call Trace:
[ 70.381058] [] ? crypt_dec_pending+0x5e/0x62 [dm_crypt]
[ 70.381058] [] ? crypt_endio+0xa2/0xaa [dm_crypt]
[ 70.381058] [] ? crypt_endio+0x0/0xaa [dm_crypt]
[ 70.381058] [] ? bio_endio+0x2b/0x2e
[ 70.381058] [] ? dec_pending+0x224/0x23b [dm_mod]
[ 70.381058] [] ? clone_endio+0x79/0xa4 [dm_mod]
[ 70.381058] [] ? clone_endio+0x0/0xa4 [dm_mod]
[ 70.381058] [] ? bio_endio+0x2b/0x2e
[ 70.381058] [] ? loop_thread+0x380/0x3b7
[ 70.381058] [] ? do_lo_send_aops+0x0/0x165
[ 70.381058] [] ? autoremove_wake_function+0x0/0x33
[ 70.381058] [] ? loop_thread+0x0/0x3b7

When a table is being replaced, it waits for I/O to complete
before destroying the mempool, but the endio function doesn't
call mempool_free() until after completing the bio.

Fix it by swapping the order of those two operations.

The same problem occurs in dm.c with md referenced after dec_pending.
Again, we swap the order.

Cc: stable@kernel.org
Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-03-17 01:44:36 +0800

06 Jan, 2009

5 commits

784aae735 dm: add name and uuid to sysfs ... Browse Code »

Implement simple read-only sysfs entry for device-mapper block device.

This patch adds a simple sysfs directory named "dm" under block device
properties and implements
- name attribute (string containing mapped device name)
- uuid attribute (string containing UUID, or empty string if not set)

The kobject is embedded in mapped_device struct, so no additional
memory allocation is needed for initializing sysfs entry.

During the processing of sysfs attribute we need to lock mapped device
which is done by a new function dm_get_from_kobj, which returns the md
associated with kobject and increases the usage count.

Each 'show attribute' function is responsible for its own locking.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-01-06 11:05:12 +0800
d58168763 dm table: rework reference counting ... Browse Code »

Rework table reference counting.

The existing code uses a reference counter. When the last reference is
dropped and the counter reaches zero, the table destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.

This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
same process.
* stale reference from some other kernel code keeps the table
constructed, which keeps some devices open, even after successful
return from "dmsetup remove". This can confuse lvm and prevent closing
of underlying devices or reusing device minor numbers.

The patch changes reference counting so that the table destructor can be
called only at predetermined places.

The table has always exactly one reference from either mapped_device->map
or hash_cell->new_map. After this patch, this reference is not counted
in table->holders. A pair of dm_create_table/dm_destroy_table functions
is used for table creation/destruction.

Temporary references from the other code increase table->holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.

When the table is about to be destroyed, we wait for table->holders to
reach 0. Then, we call the table destructor. We use active waiting with
msleep(1), because the situation happens rarely (to one user in 5 years)
and removing the device isn't performance-critical task: the user doesn't
care if it takes one tick more or not.

This way, the destructor is called only at specific points
(dm_table_destroy function) and the above problems associated with lazy
destruction can't happen.

Finally remove the temporary protection added to dm_any_congested().

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-01-06 11:05:10 +0800
ab4c14248 dm: support barriers on simple devices ... Browse Code »

Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

NB. Any DM device might cease to support barriers if it gets
reconfigured so code must continue to allow for a possible
-EOPNOTSUPP on every barrier bio submitted. - agk

Signed-off-by: Andi Kleen
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Andi Kleen
2009-01-06 11:05:09 +0800
8fbf26ad5 dm request: add caches ... Browse Code »

This patch prepares some kmem_caches for request-based dm.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-01-06 11:05:06 +0800
a1b51e986 dm table: drop reference at unbind ... Browse Code »

Move one dm_table_put() so that the last reference in the thread
gets dropped in __unbind().

This is required for a following patch,
dm-table-rework-reference-counting.patch, which will change the logic in
such a way that table destructor is called only at specific points in
the code.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-01-06 11:04:53 +0800

29 Dec, 2008

1 commit

bb799ca02 bio: allow individual slabs in the bio_set ... Browse Code »

Instead of having a global bio slab cache, add a reference to one
in each bio_set that is created. This allows for personalized slabs
in each bio_set, so that they can have bios of different sizes.

This means we can personalize the bios we return. File systems may
want to embed the bio inside another structure, to avoid allocation
more items (and stuffing them in ->bi_private) after the get a bio.
Or we may want to embed a number of bio_vecs directly at the end
of a bio, to avoid doing two allocations to return a bio. This is now
possible.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:23 +0800

26 Nov, 2008

2 commits

0bfc24559 blktrace: port to tracepoints, update ... Browse Code »

Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
sites. Spread them out to the usage sites, as suggested by
Mathieu Desnoyers.

Signed-off-by: Ingo Molnar
Acked-by: Mathieu Desnoyers

Ingo Molnar
2008-11-26 20:04:35 +0800
5f3ea37c7 blktrace: port to tracepoints ... Browse Code »

This was a forward port of work done by Mathieu Desnoyers, I changed it to
encode the 'what' parameter on the tracepoint name, so that one can register
interest in specific events and not on classes of events to then check the
'what' parameter.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jens Axboe
Signed-off-by: Ingo Molnar

Arnaldo Carvalho de Melo
2008-11-26 19:13:34 +0800

14 Nov, 2008

2 commits

8a57dfc6f dm: avoid destroying table in dm_any_congested ... Browse Code »

dm_any_congested() just checks for the DMF_BLOCK_IO and has no
code to make sure that suspend waits for dm_any_congested() to
complete. This patch adds such a check.

Without it, a race can occur with dm_table_put() attempting to
destroying the table in the wrong thread, the one running
dm_any_congested() which is meant to be quick and return
immediately.

Two examples of problems:
1. Sleeping functions called from congested code, the caller
of which holds a spin lock.
2. An ABBA deadlock between pdflush and multipathd. The two locks
in contention are inode lock and kernel lock.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Chandra Seetharaman
2008-11-14 07:39:14 +0800
d221d2e77 dm: move pending queue wake_up end_io_acct ... Browse Code »

This doesn't fix any bug, just moves wake_up immediately after decrementing
md->pending, for better code readability.

It must be clear to anyone manipulating md->pending to wake up
the queue if md->pending reaches zero, so move the wakeup as close to
the decrementing as possible.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-11-14 07:39:10 +0800

24 Oct, 2008

1 commit

224848564 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...

Linus Torvalds
2008-10-24 01:23:07 +0800

22 Oct, 2008

3 commits

51157b4ab dm: tidy local_init ... Browse Code »

This patch tidies local_init() in preparation for request-based dm.
No functional change.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2008-10-22 00:45:08 +0800
f431d9666 dm: remove unused flush_all ... Browse Code »

This patch removes the DM_WQ_FLUSH_ALL state that is unnecessary.

The dm_queue_flush(md, DM_WQ_FLUSH_ALL, NULL) in dm_suspend()
is never invoked because:
- 'goto flush_and_out' is the same as 'goto out' because
the 'goto flush_and_out' is called only when '!noflush'
- If r is non-zero, then the code above will invoke 'goto out'
and skip this code.

No functional change.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2008-10-22 00:45:07 +0800
f3e1d26ed dm: mark split bio as cloned ... Browse Code »

When a bio gets split, mark its fragments with the BIO_CLONED flag.

Signed-off-by: Martin K. Petersen
Signed-off-by: Alasdair G Kergon

Martin K. Petersen
2008-10-22 00:45:04 +0800

21 Oct, 2008

2 commits

fe5f9f2cd [PATCH] switch dm ... Browse Code »

ioctl() doesn't need BKL here

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:48:29 +0800
d4430d62f [PATCH] beginning of methods conversion ... Browse Code »

To keep the size of changesets sane we split the switch by drivers;
to keep the damn thing bisectable we do the following:
1) rename the affected methods, add ones with correct
prototypes, make (few) callers handle both. That's this changeset.
2) for each driver convert to new methods. *ALL* drivers
are converted in this series.
3) kill the old (renamed) methods.

Note that it _is_ a flagday; all in-tree drivers are converted and by the
end of this series no trace of old methods remain. The only reason why
we do that this way is to keep the damn thing bisectable and allow per-driver
debugging if anything goes wrong.

New methods:
open(bdev, mode)
release(disk, mode)
ioctl(bdev, mode, cmd, arg) /* Called without BKL */
compat_ioctl(bdev, mode, cmd, arg)
locked_ioctl(bdev, mode, cmd, arg) /* Called with BKL, legacy */

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:47:32 +0800