Eric Lee / smarc-fsl-linux-kernel

04 Jan, 2012

1 commit

ff01bb483 fs: move code out of buffer.c ... Browse Code »

Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.

Signed-off-by: Nick Piggin
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:07 +0800

05 Nov, 2011

1 commit

b4fdcb02f Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...

Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c

Linus Torvalds
2011-11-05 08:06:58 +0800

01 Nov, 2011

4 commits

3cf2e4ba7 dm: export dm get md ... Browse Code »

Export dm_get_md() for the new thin provisioning target to use.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-11-01 04:19:06 +0800
36a0456fb dm table: add immutable feature ... Browse Code »

Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed
with any other target type, and once loaded into a device, it cannot be
replaced with a table containing a different type.

The thin provisioning pool device will use this.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-11-01 04:19:04 +0800
fbdc86f3b dm: remove superfluous smp_mb ... Browse Code »

Since set_current_state() contains a memory barrier in it,
an additional barrier isn't needed.

Signed-off-by: Namhyung Kim
Signed-off-by: Alasdair G Kergon

Namhyung Kim
2011-11-01 04:18:56 +0800
71a16736a dm: use local printk ratelimit ... Browse Code »

printk_ratelimit() shares global ratelimiting state with all
other subsystems, so its usage is discouraged. Instead,
define and use dm's local state.

Signed-off-by: Namhyung Kim
Signed-off-by: Alasdair G Kergon

Namhyung Kim
2011-11-01 04:18:54 +0800

12 Sep, 2011

3 commits

5a7bbad27 block: remove support for bio remapping from ->make_request ... Browse Code »
86

There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.

Signed-off-by: Christoph Hellwig
Acked-by: NeilBrown
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-09-12 18:12:01 +0800
c20e8de27 block: rename __make_request() to blk_queue_bio() ... Browse Code »

Now that it's exported, lets put it in a more sane namespace.

Signed-off-by: Jens Axboe

Jens Axboe
2011-09-12 18:08:31 +0800
166e1f901 block: export __make_request ... Browse Code »

Avoid the hacks need for request based device mappers currently by simply
exporting the symbol instead of trying to get it through the back door.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-09-12 18:08:27 +0800

02 Aug, 2011

4 commits

ed8b752bc dm table: set flush capability based on underlying devices ... Browse Code »

DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
regardless of whether or not a given DM device's underlying devices
also advertised a need for them.

Block's flush-merge changes from 2.6.39 have proven to be more costly
for DM devices. Performance regressions have been reported even when
DM's underlying devices do not advertise that they have a write cache.

Fix the performance regressions by configuring a DM device's flushing
capabilities based on those of the underlying devices' capabilities.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:08 +0800
d5b9dd04b dm: ignore merge_bvec for snapshots when safe ... Browse Code »

Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns. When set, use this to send large bios to snapshots
which can split them if necessary. Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.

Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function. After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.

The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped. Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function. Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.

If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:04 +0800
936688d7e dm table: fix discard support ... Browse Code »

Remove 'discards_supported' from the dm_table structure. The same
information can be easily discovered from the table's target(s) in
dm_table_supports_discards().

Before this fix dm_table_supports_discards() would skip checking the
individual targets' 'discards_supported' flag if any one target in the
table didn't set num_discard_requests > 0. Now the per-target
'discards_supported' flag is effective at insuring the final DM device
advertises discard support. But, to be clear, targets that don't
support discards (!num_discard_requests) will not receive discard
requests.

Also DMWARN if a target sets 'discards_supported' override but forgets
to set 'num_discard_requests'.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:01 +0800
d15b774c2 dm: fix idr leak on module removal ... Browse Code »
1

Destroy _minor_idr when unloading the core dm module. (Found by kmemleak.)

Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-08-02 19:32:01 +0800

22 Mar, 2011

1 commit

1e9bb8808 block: fix non-atomic access to genhd inflight structures ... Browse Code »

After the stack plugging introduction, these are called lockless.
Ensure that the counters are updated atomically.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-03-22 15:35:35 +0800

17 Mar, 2011

1 commit

a91a2785b block: Require subsystems to explicitly allocate bio_set integrity mempool ... Browse Code »

MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.

Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.

Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.

Signed-off-by: Martin K. Petersen
Reported-by: Zdenek Kabelac
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2011-03-17 18:11:05 +0800

10 Mar, 2011

1 commit

7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

14 Jan, 2011

5 commits

052189a2e dm: remove superfluous irq disablement in dm_request_fn ... Browse Code »

This patch changes spin_lock_irq() to spin_lock() in dm_request_fn().
This patch is just a clean-up and no functional change.

The spin_lock_irq() was leftover from the early request-based dm code,
where map_request() used to enable interrupts.
Since current map_request() never enables interrupts, we can change it
to spin_lock() to match the prior spin_unlock().

Auditing through the dm and block-layer code called from
map_request(), I confirmed all functions save/restore interrupt
status, so no function returning with interrupts enabled.
Also I haven't observed any problem on my test environment which
uses scsi and lpfc driver after heavy I/O testing with occasional
path down/up.

Added BUG_ON() to detect breakage in future.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2011-01-14 04:00:00 +0800
9c4376de9 dm: use non reentrant workqueues if equivalent ... Browse Code »

kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
serve only a single work item from the dm instance, so non-reentrant
workqueues would provide the same ordering guarantees as ordered ones
while allowing CPU affinity and use of the workqueues for other
purposes. Switch them to non-reentrant workqueues.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:58 +0800
4d4d66ab5 dm: convert workqueues to alloc_ordered ... Browse Code »

Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue(). This conversion is mechanical and
doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:57 +0800
4a1aeb982 dm: remove dm_mutex after bkl conversion ... Browse Code »

This patch replaces dm_mutex with _minor_lock in dm_blk_close()
and then removes it.

During the BKL conversion, commit 6e9624b8caec290d28b4c6d9ec75749df6372b87
(block: push down BKL into .open and .release) pushed lock_kernel()
down into dm_blk_open/close calls.
Commit 2a48fc0ab24241755dc93bfd4f01d68efab47f5a
(block: autoconvert trivial BKL users to private mutex) converted it to a
local mutex, but _minor_lock is sufficient.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2011-01-14 03:59:48 +0800
c217649bf dm: dont take i_mutex to change device size ... Browse Code »

No longer needlessly hold md->bdev->bd_inode->i_mutex when changing the
size of a DM device. This additional locking is unnecessary because
i_size_write() is already protected by the existing critical section in
dm_swap_table(). DM already has a reference on md->bdev so the
associated bd_inode may be changed without lifetime concerns.

A negative side-effect of having held md->bdev->bd_inode->i_mutex was
that a concurrent DM device resize and flush (via fsync) would deadlock.
Dropping md->bdev->bd_inode->i_mutex eliminates this potential for
deadlock. The following reproducer no longer deadlocks:
https://www.redhat.com/archives/dm-devel/2009-July/msg00284.html

Signed-off-by: Mike Snitzer
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon
Cc: stable@kernel.org

Mike Snitzer
2011-01-14 03:53:46 +0800

07 Jan, 2011

1 commit

b7908c103 block: trace event block fix unassigned field ... Browse Code »

The "error" field in block_bio_complete is not assigned, leaving the memory area
uninitialized (keeping garbage data). Pass an additional tracepoint argument to
this event to initialize this field.

Signed-off-by: Jeff Moyer
Signed-off-by: Mathieu Desnoyers
CC: Steven Rostedt
CC: Frederic Weisbecker
CC: Ingo Molnar
CC: Thomas Gleixner
CC: Li Zefan
CC: Alan.Brunelle@hp.com
Signed-off-by: Jens Axboe

Jeff Moyer
2011-01-07 15:43:55 +0800

16 Nov, 2010

1 commit

d07335e51 block: Rename "block_remap" tracepoint to "block_bio_remap" to clarify the event. ... Browse Code »

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2010-11-16 19:53:39 +0800

23 Oct, 2010

1 commit

a2887097f Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...

Linus Torvalds
2010-10-23 08:07:18 +0800

05 Oct, 2010

1 commit

2a48fc0ab block: autoconvert trivial BKL users to private mutex ... Browse Code »

The block device drivers have all gained new lock_kernel
calls from a recent pushdown, and some of the drivers
were already using the BKL before.

This turns the BKL into a set of per-driver mutexes.
Still need to check whether this is safe to do.

file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*/d' ${file}
else
sed -i 's/include.*.*$/include /g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^$static\|int\|long$/ {
/^$static\|int\|long$/istatic DEFINE_MUTEX(${name}_mutex);

} }" \
-e "s/$un$*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi

Signed-off-by: Arnd Bergmann

Arnd Bergmann
2010-10-05 21:01:10 +0800

10 Sep, 2010

6 commits

b372d360d dm: convey that all flushes are processed as empty ... Browse Code »

Rename __clone_and_map_flush to __clone_and_map_empty_flush for added
clarity.

Simplify logic associated with REQ_FLUSH conditionals.

Introduce a BUG_ON() and add a few more helpful comments to the code
so that it is clear that all flushes are empty.

Cleanup __split_and_process_bio() so that an empty flush isn't processed
by a 'sector_count' focused while loop.

Signed-off-by: Mike Snitzer
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Mike Snitzer
2010-09-10 18:35:38 +0800
05447420f dm: fix locking context in queue_io() ... Browse Code »

Now queue_io() is called from dec_pending(), which may be called with
interrupts disabled, so queue_io() must not enable interrupts
unconditionally and must save/restore the current interrupts status.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Kiyoshi Ueda
2010-09-10 18:35:38 +0800
6a8736d10 dm: relax ordering of bio-based flush implementation ... Browse Code »

Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA doesn't mandate any ordering
against other bio's. This patch relaxes ordering around flushes.

* A flush bio is no longer deferred to workqueue directly. It's
processed like other bio's but __split_and_process_bio() uses
md->flush_bio as the clone source. md->flush_bio is initialized to
empty flush during md initialization and shared for all flushes.

* As a flush bio now travels through the same execution path as other
bio's, there's no need for dedicated error handling path either. It
can use the same error handling path in dec_pending(). Dedicated
error handling removed along with md->flush_error.

* When dec_pending() detects that a flush has completed, it checks
whether the original bio has data. If so, the bio is queued to the
deferred list w/ REQ_FLUSH cleared; otherwise, it's completed.

* As flush sequencing is handled in the usual issue/completion path,
dm_wq_work() no longer needs to handle flushes differently. Now its
only responsibility is re-issuing deferred bio's the same way as
_dm_request() would. REQ_FLUSH handling logic including
process_flush() is dropped.

* There's no reason for queue_io() and dm_wq_work() write lock
dm->io_lock. queue_io() now only uses md->deferred_lock and
dm_wq_work() read locks dm->io_lock.

* bio's no longer need to be queued on the deferred list while a flush
is in progress making DMF_QUEUE_IO_TO_THREAD unncessary. Drop it.

This avoids stalling the device during flushes and simplifies the
implementation.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:38 +0800
29e4013de dm: implement REQ_FLUSH/FUA support for request-based dm ... Browse Code »

This patch converts request-based dm to support the new REQ_FLUSH/FUA.

The original request-based flush implementation depended on
request_queue blocking other requests while a barrier sequence is in
progress, which is no longer true for the new REQ_FLUSH/FUA.

In general, request-based dm doesn't have infrastructure for cloning
one source request to multiple targets, but the original flush
implementation had a special mostly independent path which can issue
flushes to multiple targets and sequence them. However, the
capability isn't currently in use and adds a lot of complexity.
Moreoever, it's unlikely to be useful in its current form as it
doesn't make sense to be able to send out flushes to multiple targets
when write requests can't be.

This patch rips out special flush code path and deals handles
REQ_FLUSH/FUA requests the same way as other requests. The only
special treatment is that REQ_FLUSH requests use the block address 0
when finding target, which is enough for now.

* added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
suggested by Mike Snitzer

Signed-off-by: Tejun Heo
Acked-by: Mike Snitzer
Tested-by: Kiyoshi Ueda
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:38 +0800
d87f4c14f dm: implement REQ_FLUSH/FUA support for bio-based dm ... Browse Code »

This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
now deprecated REQ_HARDBARRIER.

* -EOPNOTSUPP handling logic dropped.

* Preflush is handled as before but postflush is dropped and replaced
with passing down REQ_FUA to member request_queues. This replaces
one array wide cache flush w/ member specific FUA writes.

* __split_and_process_bio() now calls __clone_and_map_flush() directly
for flushes and guarantees all FLUSH bio's going to targets are zero
` length.

* It's now guaranteed that all FLUSH bio's which are passed onto dm
targets are zero length. bio_empty_barrier() tests are replaced
with REQ_FLUSH tests.

* Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

* Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
enough to be marked with unlikely().

* Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
capability.

* Request based dm isn't converted yet. dm_init_request_based_queue()
resets flush support to 0 for now. To avoid disturbing request
based dm code, dm->flush_error is added for bio based dm while
requested based dm continues to use dm->barrier_error.

Lightly tested linear, stripe, raid1, snap and crypt targets. Please
proceed with caution as I'm not familiar with the code base.

Signed-off-by: Tejun Heo
Cc: dm-devel@redhat.com
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:38 +0800
4913efe45 block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() ... Browse Code »

Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().

blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
device has write cache and can flush it, it should set REQ_FLUSH. If
the device can handle FUA writes, it should also set REQ_FUA.

All blk_queue_ordered() users are converted.

* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.

Signed-off-by: Tejun Heo
Acked-by: Boaz Harrosh
Cc: Christoph Hellwig
Cc: Nick Piggin
Cc: Michael S. Tsirkin
Cc: Jeremy Fitzhardinge
Cc: Chris Wright
Cc: FUJITA Tomonori
Cc: Geert Uytterhoeven
Cc: David S. Miller
Cc: Alasdair G Kergon
Cc: Pierre Ossman
Cc: Stefan Weinhuber
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:36 +0800

12 Aug, 2010

9 commits

a79245b3e dm: split discard requests on target boundaries ... Browse Code »

Update __clone_and_map_discard to loop across all targets in a DM
device's table when it processes a discard bio. If a discard crosses a
target boundary it must be split accordingly.

Update __issue_target_requests and __issue_target_request to allow a
cloned discard bio to have a custom start sector and size.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:24 +0800
56a67df76 dm: factor out max_io_len_target_boundary ... Browse Code »

Split max_io_len_target_boundary out of max_io_len so that the discard
support can make use of it without duplicating max_io_len code.

Avoiding max_io_len's split_io logic enables DM's discard support to
submit the entire discard request to a target. But discards must still
be split on target boundaries.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:10 +0800
06a426cee dm: use common __issue_target_request for flush and discard support ... Browse Code »

Rename __flush_target to __issue_target_request now that it is used to
issue both flush and discard requests.

Introduce __issue_target_requests as a convenient wrapper to
__issue_target_request 'num_flush_requests' or 'num_discard_requests'
times per target.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:09 +0800
5ae89a872 dm: linear support discard ... Browse Code »

Allow discards to be passed through to linear mappings if at least one
underlying device supports it. Discards will be forwarded only to
devices that support them.

A target that supports discards should set num_discard_requests to
indicate how many times each discard request must be submitted to it.

Verify table's underlying devices support discards prior to setting the
associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).

Signed-off-by: Mike Snitzer
Signed-off-by: Mikulas Patocka
Reviewed-by: Joe Thornber
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:08 +0800
57cba5d36 dm: rename map_info flush_request to target_request_nr ... Browse Code »

'target_request_nr' is a more generic name that reflects the fact that
it will be used for both flush and discard support.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:04 +0800
4a0b4ddf2 dm: do not initialise full request queue when bio based ... Browse Code »

Change bio-based mapped devices no longer to have a fully initialized
request_queue (request_fn, elevator, etc). This means bio-based DM
devices no longer register elevator sysfs attributes ('iosched/' tree
or 'scheduler' other than "none").

In contrast, a request-based DM device will continue to have a full
request_queue and will register elevator sysfs attributes. Therefore
a user can determine a DM device's type by checking if elevator sysfs
attributes exist.

First allocate a minimalist request_queue structure for a DM device
(needed for both bio and request-based DM).

Initialization of a full request_queue is deferred until it is known
that the DM device is request-based, at the end of the table load
sequence.

Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().

The md->type_lock mutex is used to protect md->queue, in addition to
md->type, during table_load().

A DM device's first table_load will establish the immutable md->type.
But md->queue initialization, based on md->type, may fail at that time
(because blk_init_allocated_queue cannot allocate memory). Therefore
any subsequent table_load must (re)try dm_setup_md_queue independently of
establishing md->type.

Signed-off-by: Mike Snitzer
Acked-by: Kiyoshi Ueda
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:02 +0800
a5664dad7 dm ioctl: make bio or request based device type immutable ... Browse Code »

Determine whether a mapped device is bio-based or request-based when
loading its first (inactive) table and don't allow that to be changed
later.

This patch performs different device initialisation in each of the two
cases. (We don't think it's necessary to add code to support changing
between the two types.)

Allowed md->type transitions:
DM_TYPE_NONE to DM_TYPE_BIO_BASED
DM_TYPE_NONE to DM_TYPE_REQUEST_BASED

We now prevent table_load from replacing the inactive table with a
conflicting type of table even after an explicit table_clear.

Introduce 'type_lock' into the struct mapped_device to protect md->type
and to prepare for the next patch that will change the queue
initialization and allocate memory while md->type_lock is held.

Signed-off-by: Mike Snitzer
Acked-by: Kiyoshi Ueda
Signed-off-by: Alasdair G Kergon

drivers/md/dm-ioctl.c | 15 +++++++++++++++
drivers/md/dm.c | 37 ++++++++++++++++++++++++++++++-------
drivers/md/dm.h | 5 +++++
include/linux/dm-ioctl.h | 4 ++--
4 files changed, 52 insertions(+), 9 deletions(-)

Mike Snitzer
2010-08-12 11:14:01 +0800
708e92951 dm: skip second flush on bio unsupported error ... Browse Code »

When processing barriers, skip the second flush if processing the bio
failed with -EOPNOTSUPP. This can happen with discard+barrier requests.
If the device doesn't support discard, there would be two useless
SYNCHRONIZE CACHE commands. The first dm_flush cannot be so easily
optimized out, so we leave it there.

Previously, -EOPNOTSUPP could be received in dec_pending only with empty
barriers and we ignored that error, assuming the device not supporting
cache flushes has cache always consistent. With the addition of discard
barriers, this -EOPNOTSUPP can also be generated by discards and we
must record it in md->barrier_error for process_barrier.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2010-08-12 11:14:00 +0800
3f77316de dm: separate device deletion from dm_put ... Browse Code »

This patch separates the device deletion code from dm_put()
to make sure the deletion happens in the process context.

By this patch, device deletion always occurs in an ioctl (process)
context and dm_put() can be called in interrupt context.
As a result, the request-based dm's bad dm_put() usage pointed out
by Mikulas below disappears.
http://marc.info/?l=dm-devel&m=126699981019735&w=2

Without this patch, I confirmed there is a case to crash the system:
dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt())

Some more backgrounds and details:
In request-based dm, a device opener can remove a mapped_device
while the last request is still completing, because bios in the last
request complete first and then the device opener can close and remove
the mapped_device before the last request completes:
CPU0 CPU1
=================================================================
<>
blk_end_request_all(clone_rq)
blk_update_request(clone_rq)
bio_endio(clone_bio) == end_clone_bio
blk_update_request(orig_rq)
bio_endio(orig_bio)
<>
dm_blk_close()
dev_remove()
dm_put(md)
<>
blk_finish_request(clone_rq)
....
dm_end_request(clone_rq)
free_rq_clone(clone_rq)
blk_end_request_all(orig_rq)
rq_completed(md)

So request-based dm used dm_get()/dm_put() to hold md for each I/O
until its request completion handling is fully done.
However, the final dm_put() can call the device deletion code which
must not be run in interrupt context and may cause kernel panic.

To solve the problem, this patch moves the device deletion code,
dm_destroy(), to predetermined places that is actually deleting
the mapped_device in ioctl (process) context, and changes dm_put()
just to decrement the reference count of the mapped_device.
By this change, dm_put() can be used in any context and the symmetric
model below is introduced:
dm_create(): create a mapped_device
dm_destroy(): destroy a mapped_device
dm_get(): increment the reference count of a mapped_device
dm_put(): decrement the reference count of a mapped_device

dm_destroy() waits for all references of the mapped_device to disappear,
then deletes the mapped_device.

dm_destroy() uses active waiting with msleep(1), since deleting
the mapped_device isn't performance-critical task.
And since at this point, nobody opens the mapped_device and no new
reference will be taken, the pending counts are just for racing
completing activity and will eventually decrease to zero.

For the unlikely case of the forced module unload, dm_destroy_immediate(),
which doesn't wait and forcibly deletes the mapped_device, is also
introduced and used in dm_hash_remove_all(). Otherwise, "rmmod -f"
may be stuck and never return.
And now, because the mapped_device is deleted at this point, subsequent
accesses to the mapped_device may cause NULL pointer references.

Cc: stable@kernel.org
Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-08-12 11:13:56 +0800