Eric Lee / linux-smarc-t335x-v3.2

24 Oct, 2011

1 commit

d136f2efd dm kcopyd: fix job_pool leak ... Browse Code »

Fix memory leak introduced by commit a6e50b409d3f9e0833e69c3c9cca822e8fa4adbb
(dm snapshot: skip reading origin when overwriting complete chunk).

When allocating a set of jobs from kc->job_pool, job->master_job must be
set (to point to itself) so that the mempool item gets freed when the
master_job completes.

master_job was introduced by commit c6ea41fbbe08f270a8edef99dc369faf809d1bd6
(dm kcopyd: preallocate sub jobs to avoid deadlock)

Reported-by: Michael Leun
Cc: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-10-24 03:55:17 +0800

06 Oct, 2011

1 commit

6367f1775 Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dm ... Browse Code »

* 'for-linus' of http://people.redhat.com/agk/git/linux-dm:
dm crypt: always disable discard_zeroes_data
dm: raid fix write_mostly arg validation
dm table: avoid crash if integrity profile changes
dm: flakey fix corrupt_bio_byte error path

Linus Torvalds
2011-10-06 23:31:47 +0800

26 Sep, 2011

4 commits

983c7db34 dm crypt: always disable discard_zeroes_data ... Browse Code »

If optional discard support in dm-crypt is enabled, discards requests
bypass the crypt queue and blocks of the underlying device are discarded.
For the read path, discarded blocks are handled the same as normal
ciphertext blocks, thus decrypted.

So if the underlying device announces discarded regions return zeroes,
dm-crypt must disable this flag because after decryption there is just
random noise instead of zeroes.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2011-09-26 06:26:21 +0800
823248094 dm: raid fix write_mostly arg validation ... Browse Code »

Fix off-by-one error in validation of write_mostly.

The user-supplied value given for the 'write_mostly' argument must be an
index starting at 0. The validation of the supplied argument failed to
check for 'N' ('>' vs '>='), which would have caused an access beyond the
end of the array.

Reported-by: Doug Ledford
Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonthan Brassow
2011-09-26 06:26:19 +0800
876fbba1d dm table: avoid crash if integrity profile changes ... Browse Code »

Commit a63a5cf (dm: improve block integrity support) introduced a
two-phase initialization of a DM device's integrity profile. This
patch avoids dereferencing a NULL 'template_disk' pointer in
blk_integrity_register() if there is an integrity profile mismatch in
dm_table_set_integrity().

This can occur if the integrity profiles for stacked devices in a DM
table are changed between the call to dm_table_prealloc_integrity() and
dm_table_set_integrity().

Reported-by: Zdenek Kabelac
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon
Cc: stable@kernel.org # 2.6.39

Mike Snitzer
2011-09-26 06:26:17 +0800
68e58a294 dm: flakey fix corrupt_bio_byte error path ... Browse Code »

If no arguments were provided to the corrupt_bio_byte feature an error
should be returned immediately.

Reported-by: Zdenek Kabelac
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-09-26 06:26:15 +0800

21 Sep, 2011

1 commit

01f96c0a9 md: Avoid waking up a thread after it has been freed. ... Browse Code »

Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
without subsequently clearing ->thread. A subsequent call
to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
disappeared either by:
- holding the ->mutex
- having an active request, so something else must be keeping
the array active.
However mddev_unlock calls md_wakeup_thread after dropping the
mutex and without any certainty of an active request, so the
->thread could theoretically disappear.
So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.

Reported-by: "Moshe Melnikov"
Signed-off-by: NeilBrown
cc: stable@kernel.org

NeilBrown
2011-09-21 13:30:20 +0800

10 Sep, 2011

3 commits

27a7b260f md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. ... Browse Code »

0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.

Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.

Also the sanity check at the end of super_90_load should include level
1 as it used ->size too. (RAID0 and Linear don't use ->size at all).

Reported-by: Pim Zandbergen
Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-09-10 15:21:28 +0800
079fa166a md/raid1,10: Remove use-after-free bug in make_request. ... Browse Code »

A single request to RAID1 or RAID10 might result in multiple
requests if there are known bad blocks that need to be avoided.

To detect if we need to submit another write request we test:
if (sectors_handled < (bio->bi_size >> 9)) {

However this is after we call **_write_done() so the 'bio' no longer
belongs to us - the writes could have completed and the bio freed.

So move the **_write_done call until after the test against
bio->bi_size.

This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862

Reported-by: Bruno Wolff III
Tested-by: Bruno Wolff III
Signed-off-by: NeilBrown

NeilBrown
2011-09-10 15:21:23 +0800
19d5f834d md/raid10: unify handling of write completion. ... Browse Code »

A write can complete at two different places:
1/ when the last member-device write completes, through
raid10_end_write_request
2/ in make_request() when we remove the initial bias from ->remaining.

These two should do exactly the same thing and the comment says they
do, but they don't.

So factor the correct code out into a function and call it in both
places. This makes the code much more similar to RAID1.

The difference is only significant if there is an error, and they
usually take a while, so it is unlikely that there will be an error
already when make_request is completing, so this is unlikely to cause
real problems.

Signed-off-by: NeilBrown

NeilBrown
2011-09-10 15:21:17 +0800

31 Aug, 2011

1 commit

43220aa0f md/raid5: fix a hang on device failure. ... Browse Code »

Waiting for a 'blocked' rdev to become unblocked in the raid5d thread
cannot work with internal metadata as it is the raid5d thread which
will clear the blocked flag.
This wasn't a problem in 3.0 and earlier as we only set the blocked
flag when external metadata was used then.
However we now set it always, so we need to be more careful.

Signed-off-by: NeilBrown

NeilBrown
2011-08-31 10:49:14 +0800

30 Aug, 2011

1 commit

7da64a0ab md: fix clearing of 'blocked' flag in the presence of bad blocks. ... Browse Code »

When the 'blocked' flag on a device is cleared while there are
unacknowledged bad blocks we must fail the device. This is needed for
backwards compatability of the interface.

The code currently uses the wrong test for "unacknowledged bad blocks
exist". Change it to the right test.

Signed-off-by: NeilBrown

NeilBrown
2011-08-30 14:20:17 +0800

25 Aug, 2011

4 commits

1b6afa175 md/linear: avoid corrupting structure while waiting for rcu_free to complete. ... Browse Code »

I don't know what I was thinking putting 'rcu' after a dynamically
sized array! The array could still be in use when we call rcu_free()
(That is the point) so we mustn't corrupt it.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-08-25 12:43:53 +0800
a5bf4df0c md: use REQ_NOIDLE flag in md_super_write() ... Browse Code »

Queue idling is used for the anticipation of immediate
sequencial I/O's but md_super_write() is a kind of one-
shot operation, coupled with md_super_wait(), so the
idling in this case will be just a waste of time.

Specifying REQ_NOIDLE prevents it. Instead of adding
the flag to submit_bio() directly, use pre-defined
macro WRITE_FLUSH_FUA.

Signed-off-by: Namhyung Kim
Signed-off-by: NeilBrown

Namhyung Kim
2011-08-25 12:43:34 +0800
aeb9b2118 md: ensure changes to 'write-mostly' are reflected in metadata. ... Browse Code »

The 'write-mostly' flag can be changed through sysfs.
With 0.90 metadata, those changes are reflected in the metadata.
For 1.x metadata, they aren't.

So fix super_1_sync to record 'write-mostly' status.

Signed-off-by: NeilBrown

NeilBrown
2011-08-25 12:43:08 +0800
5ef56c8fe md: report failure if a 'set faulty' request doesn't. ... Browse Code »

Sometimes a device will refuse to be set faulty. e.g. RAID1 will
never let the last working device become faulty.

So check if "md_error()" did manage to set the faulty flag and fail
with EBUSY if it didn't.

Resolves-Debian-Bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601198
Reported-by: Mike Hommey
Signed-off-by: NeilBrown

NeilBrown
2011-08-25 12:42:51 +0800

02 Aug, 2011

24 commits

ed8b752bc dm table: set flush capability based on underlying devices ... Browse Code »

DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
regardless of whether or not a given DM device's underlying devices
also advertised a need for them.

Block's flush-merge changes from 2.6.39 have proven to be more costly
for DM devices. Performance regressions have been reported even when
DM's underlying devices do not advertise that they have a write cache.

Fix the performance regressions by configuring a DM device's flushing
capabilities based on those of the underlying devices' capabilities.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:08 +0800
772ae5f54 dm crypt: optionally support discard requests ... Browse Code »

Add optional parameter field to dmcrypt table and support
"allow_discards" option.

Discard requests bypass crypt queue processing. Bio is simple remapped
to underlying device.

Note that discard will be never enabled by default because of security
consequences. It is up to the administrator to enable it for encrypted
devices.

(Note that userspace cryptsetup does not understand new optional
parameters yet. Support for this will come later. Until then, you
should use 'dmsetup' to enable and disable this.)

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2011-08-02 19:32:08 +0800
327372797 dm raid: add md raid1 support ... Browse Code »

Support the MD RAID1 personality through dm-raid.

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2011-08-02 19:32:07 +0800
b12d437b7 dm raid: support metadata devices ... Browse Code »

Add the ability to parse and use metadata devices to dm-raid. Although
not strictly required, without the metadata devices, many features of
RAID are unavailable. They are used to store a superblock and bitmap.

The role, or position in the array, of each device must be recorded in
its superblock. This is to help with fault handling, array reshaping,
and sanity checks. RAID 4/5/6 devices must be loaded in a specific order:
in this way, the 'array_position' field helps validate the correctness
of the mapping when it is loaded. It can be used during reshaping to
identify which devices are added/removed. Fault handling is impossible
without this field. For example, when a device fails it is recorded in
the superblock. If this is a RAID1 device and the offending device is
removed from the array, there must be a way during subsequent array
assembly to determine that the failed device was the one removed. This
is done by correlating the 'array_position' field and the bit-field
variable 'failed_devices'.

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2011-08-02 19:32:07 +0800
46bed2b5c dm raid: add write_mostly parameter ... Browse Code »

Add the write_mostly parameter to RAID1 dm-raid tables.

This allows the user to set the WriteMostly flag on a RAID1 device that
should normally be avoided for read I/O.

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2011-08-02 19:32:07 +0800
c1084561b dm raid: add region_size parameter ... Browse Code »

Allow the user to specify the region_size.

Ensures that the supplied value meets md's constraints, viz. the number of
regions does not exceed 2^21.

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2011-08-02 19:32:07 +0800
759dea204 dm ioctl: forbid multiple device specifiers ... Browse Code »

Exactly one of name, uuid or device must be specified when referencing
an existing device. This removes the ambiguity (risking the wrong
device being updated) if two conflicting parameters were specified.
Previously one parameter got used and any others were ignored silently.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:06 +0800
ba2e19b0f dm ioctl: introduce __get_dev_cell ... Browse Code »

Move logic to find device based on major/minor number to a separate
function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell).
This makes the function __find_device_hash_cell more straightforward.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:06 +0800
0ddf9644c dm ioctl: fill in device parameters in more ioctls ... Browse Code »

Move parameter filling from find_device to __find_device_hash_cell.

This patch causes ioctls using __find_device_hash_cell
(DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD)
to return device parameters, bringing them into line with the other
ioctls.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:06 +0800
a3998799f dm flakey: add corrupt_bio_byte feature ... Browse Code »

Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a
specified position with a specified value during intervals when the device is
"down".

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:06 +0800
b26f5e3d7 dm flakey: add drop_writes ... Browse Code »

Add 'drop_writes' option to drop writes silently while the
device is 'down'. Reads are not touched.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:05 +0800
dfd068b01 dm flakey: support feature args ... Browse Code »

Add the ability to specify arbitrary feature flags when creating a
flakey target. This code uses the same target argument helpers that
the multipath target does.

Also remove the superfluous 'dm-flakey' prefixes from the error messages,
as they already contain the prefix 'flakey'.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:05 +0800
30e4171bf dm flakey: use dm_target_offset and support discards ... Browse Code »

Use dm_target_offset() and support discards.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:05 +0800
498f0103e dm table: share target argument parsing functions ... Browse Code »

Move multipath target argument parsing code into dm-table so other
targets can share it.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:04 +0800
a6e50b409 dm snapshot: skip reading origin when overwriting complete chunk ... Browse Code »

If we write a full chunk in the snapshot, skip reading the origin device
because the whole chunk will be overwritten anyway.

This patch changes the snapshot write logic when a full chunk is written.
In this case:
1. allocate the exception
2. dispatch the bio (but don't report the bio completion to device mapper)
3. write the exception record
4. report bio completed

Callbacks must be done through the kcopyd thread, because callbacks must not
race with each other. So we create two new functions:

dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
(This function must not be called from interrupt context.)

dm_kcopyd_do_callback: submit callback.
(This function may be called from interrupt context.)

Performance test (on snapshots with 4k chunk size):
without the patch:
non-direct-io sequential write (dd): 17.7MB/s
direct-io sequential write (dd): 20.9MB/s
non-direct-io random write (mkfs.ext2): 0.44s

with the patch:
non-direct-io sequential write (dd): 26.5MB/s
direct-io sequential write (dd): 33.2MB/s
non-direct-io random write (mkfs.ext2): 0.27s

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:04 +0800
d5b9dd04b dm: ignore merge_bvec for snapshots when safe ... Browse Code »

Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns. When set, use this to send large bios to snapshots
which can split them if necessary. Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.

Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function. After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.

The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped. Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function. Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.

If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:04 +0800
086490125 dm table: clean dm_get_device and move exports ... Browse Code »

There is no need for __table_get_device to be factored out.
Also move the exports to the end of their respective functions.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-08-02 19:32:04 +0800
3e8dbb7f3 dm raid: tidy includes ... Browse Code »

A dm target only needs to use include/linux dm headers.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-08-02 19:32:03 +0800
2ca4c92f5 dm ioctl: prevent empty message ... Browse Code »

Detect invalid empty messages in core dm instead of requiring every target to
check this.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-08-02 19:32:03 +0800
13c87583e dm raid: cleanup parameter handling ... Browse Code »

Re-order the parameters so they are handled consistently in the same order
where defined, parsed and output.

Only include rebuild parameters in the STATUSTYPE_TABLE output if they were
supplied in the original table line.

Correct the parameter count when outputting rebuild: there are two words,
not one.

Use case-independent checks for keywords (as in other device-mapper targets).

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2011-08-02 19:32:03 +0800
a2d2b0345 dm snapshot: style cleanups ... Browse Code »

Coding style cleanups.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Jonathan Brassow

Jonathan Brassow
2011-08-02 19:32:03 +0800
aa3f0794d dm snapshot: remove unused definitions ... Browse Code »

Remove a couple of unused #defines.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:03 +0800
5bf45a3dc dm kcopyd: remove nr_pages field from job structure ... Browse Code »

The nr_pages field in struct kcopyd_job is only used temporarily in
run_pages_job() to count the number of required pages.
We can use a local variable instead.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:02 +0800
4622afb3f dm kcopyd: remove offset field from job structure ... Browse Code »

The offset field in struct kcopyd_job is always zero so remove it.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:02 +0800