Eric Lee / smarc-fsl-linux-kernel

14 Jan, 2011

40 commits

9c4bc1c2b Merge branch 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen ... Browse Code »

* 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m: Fix module linking error.
xen p2m: clear the old pte when adding a page to m2p_override
xen gntdev: use gnttab_map_refs and gnttab_unmap_refs
xen: introduce gnttab_map_refs and gnttab_unmap_refs
xen p2m: transparently change the p2m mappings in the m2p override
xen/gntdev: Fix circular locking dependency
xen/gntdev: stop using "token" argument
xen: gntdev: move use of GNTMAP_contains_pte next to the map_op
xen: add m2p override mechanism
xen: move p2m handling to separate file
xen/gntdev: add VM_PFNMAP to vma
xen/gntdev: allow usermode to map granted pages
xen: define gnttab_set_map_op/unmap_op

Fix up trivial conflict in drivers/xen/Kconfig

Linus Torvalds
2011-01-14 10:46:48 +0800
2c0076d8c Merge branch 'stable/platform-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen ... Browse Code »

* 'stable/platform-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen-platform: Fix compile errors if CONFIG_PCI is not enabled.
xen: rename platform-pci module to xen-platform-pci.
xen-platform: use PCI interfaces to request IO and MEM resources.

Linus Torvalds
2011-01-14 10:44:52 +0800
05b258e99 thp: transparent hugepage sysfs meminfo ... Browse Code »

Add hugepage statistics to per-node sysfs meminfo

Reviewed-by: Rik van Riel
Signed-off-by: David Rientjes
Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-14 09:32:46 +0800
5dfbd1d73 atmel_serial: fix RTS high after initialization in RS485 mode ... Browse Code »

When working in RS485 mode, the atmel_serial driver keeps RTS high after
the initialization of the serial port. It goes low only after the first
character has been sent.

[akpm@linux-foundation.org: simplify code]
Signed-off-by: Claudio Scordino
Signed-off-by: Arkadiusz Bubala
Tested-by: Arkadiusz Bubala
Cc: Nicolas Ferre
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Claudio Scordino
2011-01-14 09:32:31 +0800
f6bcfd94c Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (32 commits)
dm: raid456 basic support
dm: per target unplug callback support
dm: introduce target callbacks and congestion callback
dm mpath: delay activate_path retry on SCSI_DH_RETRY
dm: remove superfluous irq disablement in dm_request_fn
dm log: use PTR_ERR value instead of ENOMEM
dm snapshot: avoid storing private suspended state
dm snapshot: persistent make metadata_wq multithreaded
dm: use non reentrant workqueues if equivalent
dm: convert workqueues to alloc_ordered
dm stripe: switch from local workqueue to system_wq
dm: dont use flush_scheduled_work
dm snapshot: remove unused dm_snapshot queued_bios_work
dm ioctl: suppress needless warning messages
dm crypt: add loop aes iv generator
dm crypt: add multi key capability
dm crypt: add post iv call to iv generator
dm crypt: use io thread for reads only if mempool exhausted
dm crypt: scale to multiple cpus
dm crypt: simplify compatible table output
...

Linus Torvalds
2011-01-14 09:30:47 +0800
509e4aef4 Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: Fix removal of extra drives when converting RAID6 to RAID5
md: range check slot number when manually adding a spare.
md/raid5: handle manually-added spares in start_reshape.
md: fix sync_completed reporting for very large drives (>2TB)
md: allow suspend_lo and suspend_hi to decrease as well as increase.
md: Don't let implementation detail of curr_resync leak out through sysfs.
md: separate meta and data devs
md-new-param-to_sync_page_io
md-new-param-to-calc_dev_sboffset
md: Be more careful about clearing flags bit in ->recovery
md: md_stop_writes requires mddev_lock.
md/raid5: use sysfs_notify_dirent_safe to avoid NULL pointer
md: Ensure no IO request to get md device before it is properly initialised.
md: Fix single printks with multiple KERN_s
md: fix regression resulting in delays in clearing bits in a bitmap
md: fix regression with re-adding devices to arrays with no metadata

Linus Torvalds
2011-01-14 09:30:20 +0800
bf2cb0dab md: Fix removal of extra drives when converting RAID6 to RAID5 ... Browse Code »

When a RAID6 is converted to a RAID5, the extra drive should
be discarded. However it isn't due to a typo in a comparison.

This bug was introduced in commit e93f68a1fc6 in 2.6.35-rc4
and is suitable for any -stable since than.

As the extra drive is not removed, the 'degraded' counter is wrong and
so the RAID5 will not respond correctly to a subsequent failure.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
ba1b41b6b md: range check slot number when manually adding a spare. ... Browse Code »

When adding a spare to an active array, we should check the slot
number, but allow it to be larger than raid_disks if a reshape
is being prepared.

Apply the same test when adding a device to an
array-under-construction. It already had most of the test in place,
but not quite all.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
1a940fcee md/raid5: handle manually-added spares in start_reshape. ... Browse Code »

It is possible to manually add spares to specific slots before
starting a reshape.
raid5_start_reshape should recognised this possibility and include
it in the accounting.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
13ae864bc md: fix sync_completed reporting for very large drives (>2TB) ... Browse Code »

The values exported in the sync_completed file are unsigned long, which
overflows with very large drives, resulting in wrong values reported.

Since sync_completed uses sectors as unit, we'll start getting wrong
values with components larger than 2TB.

This patch simply replaces the use of unsigned long by unsigned long long.

Signed-off-by: Rémi Rérolle
Signed-off-by: NeilBrown

Rémi Rérolle
2011-01-14 06:14:34 +0800
23ddff379 md: allow suspend_lo and suspend_hi to decrease as well as increase. ... Browse Code »

The sysfs attributes 'suspend_lo' and 'suspend_hi' describe a region
to which read/writes are suspended so that the under lying data can be
manipulated without user-space noticing.
Currently the window they describe can only move forwards along the
device. However this is an unnecessary restriction which will cause
problems with planned developments.
So relax this restriction and allow these endpoints to move
arbitrarily.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
75d3da43c md: Don't let implementation detail of curr_resync leak out through sysfs. ... Browse Code »

mddev->curr_resync has artificial values of '1' and '2' which are used
by the code which ensures only one resync is happening at a time on
any given device.

These values are internal and should never be exposed to user-space
(except when translated appropriately as in the 'pending' status in
/proc/mdstat).

Unfortunately they are as ->curr_resync is assigned to
->curr_resync_completed and that value is directly visible through
sysfs.

So change the assignments to ->curr_resync_completed to get the same
valued from elsewhere in a form that doesn't have the magic '1' or '2'
values.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
a6ff7e089 md: separate meta and data devs ... Browse Code »

Allow the metadata to be on a separate device from the
data.

This doesn't mean the data and metadata will by on separate
physical devices - it simply gives device-mapper and userspace
tools more flexibility.

Signed-off-by: NeilBrown

Jonathan Brassow
2011-01-14 06:14:34 +0800
ccebd4c41 md-new-param-to_sync_page_io ... Browse Code »

Add new parameter to 'sync_page_io'.

The new parameter allows us to distinguish between metadata and data
operations. This becomes important later when we add the ability to
use separate devices for data and metadata.

Signed-off-by: Jonathan Brassow

Jonathan Brassow
2011-01-14 06:14:33 +0800
57b2caa39 md-new-param-to-calc_dev_sboffset ... Browse Code »

When we allow for separate devices for data and metadata
in a later patch, we will need to be able to calculate
the superblock offset based on more than the bdev.

Signed-off-by: Jonathan Brassow

Jonathan Brassow
2011-01-14 06:14:33 +0800
7ebc0be7f md: Be more careful about clearing flags bit in ->recovery ... Browse Code »

Setting ->recovery to 0 is generally not a good idea as it could clear
bits that shouldn't be cleared. In particular, MD_RECOVERY_FROZEN
should only be cleared on explicit request from user-space.

So when we need to clear things, just clear the bits that need
clearing.

As there are a few different places which reap a resync process - and
some do an incomplte job - factor out the code for doing the from
md_check_recovery and call that function instead of open coding part
of it.

Signed-off-by: NeilBrown
Reported-by: Jonathan Brassow

NeilBrown
2011-01-14 06:14:33 +0800
defad61a5 md: md_stop_writes requires mddev_lock. ... Browse Code »

As md_stop_writes manipulates the sync_thread and calls md_update_sb,
it need to be called with mddev_lock held.

In all internal cases it is, but the symbol is exported for dm-raid to
call and in that case the lock won't be help.
Do make an exported version which takes the lock, and an internal
version which does not.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:33 +0800
43c73ca43 md/raid5: use sysfs_notify_dirent_safe to avoid NULL pointer ... Browse Code »

With the module parameter 'start_dirty_degraded' set,
raid5_spare_active() previously called sysfs_notify_dirent() with a NULL
argument (rdev->sysfs_state) when a rebuild finished.

Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer

Jonathan Brassow
2011-01-14 06:14:33 +0800
0ca69886a md: Ensure no IO request to get md device before it is properly initialised. ... Browse Code »

When an md device is in the process of coming on line it is possible
for an IO request (typically a partition table probe) to get through
before the array is fully initialised, which can cause unexpected
behaviour (e.g. a crash).

So explicitly record when the array is ready for IO and don't allow IO
through until then.

There is no possibility for a similar problem when the array is going
off-line as there must only be one 'open' at that time, and it is busy
off-lining the array and so cannot send IO requests. So no memory
barrier is needed in md_stop()

This has been a bug since commit 409c57f3801 in 2.6.30 which
introduced md_make_request. Before then, each personality would
register its own make_request_fn when it was ready.
This is suitable for any stable kernel from 2.6.30.y onwards.

Cc:
Signed-off-by: NeilBrown
Reported-by: "Hawrylewicz Czarnowski, Przemyslaw"

NeilBrown
2011-01-14 06:14:33 +0800
067032bc6 md: Fix single printks with multiple KERN_<level>s ... Browse Code »

Noticed-by: Russell King
Signed-off-by: Joe Perches
Signed-off-by: NeilBrown

Joe Perches
2011-01-14 06:14:33 +0800
6c9879101 md: fix regression resulting in delays in clearing bits in a bitmap ... Browse Code »

commit 589a594be1fb (2.6.37-rc4) fixed a problem were md_thread would
sometimes call the ->run function at a bad time.

If an error is detected during array start up after the md_thread has
been started, the md_thread is killed. This resulted in the ->run
function being called once. However the array may not be in a state
that it is safe to call ->run.

However the fix imposed meant that ->run was not called on a timeout.
This means that when an array goes idle, bitmap bits do not get
cleared promptly. While the array is busy the bits will still be
cleared when appropriate so this is not very serious. There is no
risk to data.

Change the test so that we only avoid calling ->run when the thread
is being stopped. This more explicitly addresses the problem situation.

This is suitable for 2.6.37-stable and any -stable kernel to which
589a594be1fb was applied.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:13:53 +0800
9d09e663d dm: raid456 basic support ... Browse Code »

This patch is the skeleton for the DM target that will be
the bridge from DM to MD (initially RAID456 and later RAID1). It
provides a way to use device-mapper interfaces to the MD RAID456
drivers.

As with all device-mapper targets, the nominal public interfaces are the
constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
and STATUSTYPE_TABLE). The CTR table looks like the following:

1: raid \
2: \
3: ..

Line 1 contains the standard first three arguments to any device-mapper
target - the start, length, and target type fields. The target type in
this case is "raid".

Line 2 contains the arguments that define the particular raid
type/personality/level, the required arguments for that raid type, and
any optional arguments. Possible raid types include: raid4, raid5_la,
raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (again, raid1 is
planned for the future.) The list of required and optional parameters
is the same for all the current raid types. The required parameters are
positional, while the optional parameters are given as key/value pairs.
The possible parameters are as follows:
Chunk size in sectors.
[[no]sync] Force/Prevent RAID initialization
[rebuild ] Rebuild the drive indicated by the index
[daemon_sleep ] Time between bitmap daemon work to clear bits
[min_recovery_rate ] Throttle RAID initialization
[max_recovery_rate ] Throttle RAID initialization
[max_write_behind ] See '-write-behind=' (man mdadm)
[stripe_cache ] Stripe cache size for higher RAIDs

Line 3 contains the list of devices that compose the array in
metadata/data device pairs. If the metadata is stored separately, a '-'
is given for the metadata device position. If a drive has failed or is
missing at creation time, a '-' can be given for both the metadata and
data drives for a given position.

Examples:
# RAID4 - 4 data drives, 1 parity
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
0 1960893648 raid \
raid4 1 2048 \
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

# RAID4 - 4 data drives, 1 parity (no metadata devices)
# Chunk size of 1MiB, force RAID initialization,
# min recovery rate at 20 kiB/sec/disk
0 1960893648 raid \
raid4 4 2048 min_recovery_rate 20 sync\
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

Performing a 'dmsetup table' should display the CTR table used to
construct the mapping (with possible reordering of optional
parameters).

Performing a 'dmsetup status' will yield information on the state and
health of the array. The output is as follows:
1: raid \
2:

Line 1 is standard DM output. Line 2 is best shown by example:
0 1960893648 raid raid4 5 AAAAA 2/490221568
Here we can see the RAID type is raid4, there are 5 devices - all of
which are 'A'live, and the array is 2/490221568 complete with recovery.

Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

NeilBrown
2011-01-14 04:00:02 +0800
99d03c141 dm: per target unplug callback support ... Browse Code »

Add per-target unplug callback support.

Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

NeilBrown
2011-01-14 04:00:02 +0800
9d357b078 dm: introduce target callbacks and congestion callback ... Browse Code »

DM currently implements congestion checking by checking on congestion
in each component device. For raid456 we need to also check if the
stripe cache is congested.

Add per-target congestion checker callback support.

Extending the target_callbacks structure with additional callback
functions allows for establishing multiple callbacks per-target (a
callback is also needed for unplug).

Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

NeilBrown
2011-01-14 04:00:01 +0800
4e2d19e46 dm mpath: delay activate_path retry on SCSI_DH_RETRY ... Browse Code »

This patch adds a user-configurable 'pg_init_delay_msecs' feature. Use
this feature to specify the number of milliseconds to delay before
retrying scsi_dh_activate, when SCSI_DH_RETRY is returned.

SCSI Device Handlers return SCSI_DH_IMM_RETRY if we could retry
activation immediately and SCSI_DH_RETRY in cases where it is better to
retry after some delay.

Currently we immediately retry scsi_dh_activate irrespective of
SCSI_DH_IMM_RETRY and SCSI_DH_RETRY.

The 'pg_init_delay_msecs' feature may be provided during table create or
load, e.g.:
dmsetup create --table "0 20971520 multipath 3 queue_if_no_path \
pg_init_delay_msecs 2500 ..." mpatha

The default for 'pg_init_delay_msecs' is 2000 milliseconds.
Maximum configurable delay is 60000 milliseconds. Specifying a
'pg_init_delay_msecs' of 0 will cause immediate retry.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Chandra Seetharaman
Acked-by: Mike Christie
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Chandra Seetharaman
2011-01-14 04:00:01 +0800
052189a2e dm: remove superfluous irq disablement in dm_request_fn ... Browse Code »

This patch changes spin_lock_irq() to spin_lock() in dm_request_fn().
This patch is just a clean-up and no functional change.

The spin_lock_irq() was leftover from the early request-based dm code,
where map_request() used to enable interrupts.
Since current map_request() never enables interrupts, we can change it
to spin_lock() to match the prior spin_unlock().

Auditing through the dm and block-layer code called from
map_request(), I confirmed all functions save/restore interrupt
status, so no function returning with interrupts enabled.
Also I haven't observed any problem on my test environment which
uses scsi and lpfc driver after heavy I/O testing with occasional
path down/up.

Added BUG_ON() to detect breakage in future.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2011-01-14 04:00:00 +0800
dbc883f15 dm log: use PTR_ERR value instead of ENOMEM ... Browse Code »

It's nicer to return the PTR_ERR() value instead of just returning
-ENOMEM. In the current code the PTR_ERR() value is always equal to
-ENOMEM so this doesn't actually affect anything, but still...

In addition, dm_dirty_log_create() doesn't check for a specific -ENOMEM
return. So this change is safe relative to potential for a non -ENOMEM
return in the future.

Signed-off-by: Dan Carpenter
Acked-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Dan Carpenter
2011-01-14 04:00:00 +0800
b83b2f295 dm snapshot: avoid storing private suspended state ... Browse Code »

Use dm_suspended() rather than having each snapshot target maintain a
private 'suspended' flag in struct dm_snapshot.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-01-14 03:59:59 +0800
239c8dd53 dm snapshot: persistent make metadata_wq multithreaded ... Browse Code »

metadata_wq serves on-stack work items from chunk_io(). Even if
multiple chunk_io() are simultaneously in progress, each is
independent and queued only once, so multithreaded workqueue can be
safely used.

Switch metadata_wq to multithread and flush the work item instead of
the workqueue in chunk_io().

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:59 +0800
9c4376de9 dm: use non reentrant workqueues if equivalent ... Browse Code »

kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
serve only a single work item from the dm instance, so non-reentrant
workqueues would provide the same ordering guarantees as ordered ones
while allowing CPU affinity and use of the workqueues for other
purposes. Switch them to non-reentrant workqueues.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:58 +0800
4d4d66ab5 dm: convert workqueues to alloc_ordered ... Browse Code »

Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue(). This conversion is mechanical and
doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:57 +0800
f521f074a dm stripe: switch from local workqueue to system_wq ... Browse Code »

kstriped only serves sc->kstriped_ws which runs dm_table_event().
This doesn't need to be executed from an ordered workqueue w/ rescuer.
Drop kstriped and use the system_wq instead. While at it, rename
kstriped_ws to trigger_event so that it's consistent with other dm
modules.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:57 +0800
d5ffa387e dm: dont use flush_scheduled_work ... Browse Code »

flush_scheduled_work() is being deprecated. Flush the used work
directly instead. In all dm targets, the only work which uses
system_wq is ->trigger_event.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:56 +0800
fecec20e5 dm snapshot: remove unused dm_snapshot queued_bios_work ... Browse Code »

dm_snapshot->queued_bios_work isn't used. Remove ->queued_bios[_work]
from dm_snapshot structure, the flush_queued_bios work function and
ksnapd workqueue.

The DM snapshot changes that were going to use the ksnapd workqueue were
either superseded (fix for origin write races) or never completed
(deallocation of invalid snapshot's memory via workqueue).

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:56 +0800
810b49237 dm ioctl: suppress needless warning messages ... Browse Code »

The device-mapper should not send warning messages to syslog
if a device is not found. This can be done by userspace
according to the returned dm-ioctl error code.

So move these messages to debug level and use rate limiting
to not flood syslog.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2011-01-14 03:59:55 +0800
347457859 dm crypt: add loop aes iv generator ... Browse Code »

This patch adds a compatible implementation of the block
chaining mode used by the Loop-AES block device encryption
system (http://loop-aes.sourceforge.net/) designed
by Jari Ruusu.

It operates on full 512 byte sectors and uses CBC
with an IV derived from the sector number, the data and
optionally extra IV seed.

This means that after CBC decryption the first block of sector
must be tweaked according to decrypted data.

Loop-AES can use three encryption schemes:
version 1: is plain aes-cbc mode (already compatible)
version 2: uses 64 multikey scheme with own IV generator
version 3: the same as version 2 with additional IV seed
(it uses 65 keys, last key is used as IV seed)

The IV generator is here named lmk (Loop-AES multikey)
and for the cipher specification looks like: aes:64-cbc-lmk

Version 2 and 3 is recognised according to length
of provided multi-key string (which is just hexa encoded
"raw key" used in original Loop-AES ioctl).

Configuration of the device and decoding key string will
be done in userspace (cryptsetup).
(Loop-AES stores keys in gpg encrypted file, raw keys are
output of simple hashing of lines in this file).

Based on an implementation by Max Vozeler:
http://article.gmane.org/gmane.linux.kernel.cryptoapi/3752/

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon
CC: Max Vozeler

Milan Broz
2011-01-14 03:59:55 +0800
d1f964238 dm crypt: add multi key capability ... Browse Code »

This patch adds generic multikey handling to be used
in following patch for Loop-AES mode compatibility.

This patch extends mapping table to optional keycount and
implements generic multi-key capability.

With more keys defined the string is divided into
several sections and these are used for tfms.

The tfm is used according to sector offset
(sector 0->tfm[0], sector 1->tfm[1], sector N->tfm[N modulo keycount])
(only power of two values supported for keycount here).

Because of tfms per-cpu allocation, this mode can be take
a lot of memory on large smp systems.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon
Cc: Max Vozeler

Milan Broz
2011-01-14 03:59:54 +0800
2dc5327d3 dm crypt: add post iv call to iv generator ... Browse Code »

IV (initialisation vector) can in principle depend not only
on sector but also on plaintext data (or other attributes).

Change IV generator interface to work directly with dmreq
structure to allow such dependence in generator.

Also add post() function which is called after the crypto
operation.

This allows tricky modification of decrypted data or IV
internals.

In asynchronous mode the post() can be called after
ctx->sector count was increased so it is needed
to add iv_sector copy directly to dmreq structure.
(N.B. dmreq always include only one sector in scatterlists)

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2011-01-14 03:59:54 +0800
20c82538e dm crypt: use io thread for reads only if mempool exhausted ... Browse Code »

If there is enough memory, code can directly submit bio
instead queing this operation in separate thread.

Try to alloc bio clone with GFP_NOWAIT and only if it
fails use separate queue (map function cannot block here).

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2011-01-14 03:59:53 +0800
c02977212 dm crypt: scale to multiple cpus ... Browse Code »

Currently dm-crypt does all the encryption work for a single dm-crypt
mapping in a single workqueue. This does not scale well when multiple
CPUs are submitting IO at a high rate. The single CPU running the single
thread cannot keep up with the encryption and encrypted IO performance
tanks.

This patch changes the crypto workqueue to be per CPU. This means
that as long as the IO submitter (or the interrupt target CPUs
for reads) runs on different CPUs the encryption work will be also
parallel.

To avoid a bottleneck on the IO worker I also changed those to be
per-CPU threads.

There is still some shared data, so I suspect some bouncing
cache lines. But I haven't done a detailed study on that yet.

Signed-off-by: Andi Kleen
Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Andi Kleen
2011-01-14 03:59:53 +0800