Eric Lee / smarc-fsl-linux-kernel

06 Mar, 2019

2 commits

de7180ff9 dm cache: add support for discard passdown to the origin device ... Browse Code »

DM cache now defaults to passing discards down to the origin device.
User may disable this using the "no_discard_passdown" feature when
creating the cache device.

If the cache's underlying origin device doesn't support discards then
passdown is disabled (with warning). Similarly, if the underlying
origin device's max_discard_sectors is less than a cache block discard
passdown will be disabled (this is required because sizing of the cache
internal discard bitset depends on it).

Signed-off-by: Mike Snitzer

Mike Snitzer
2019-03-06 03:53:52 +0800
6bbc923df dm: add support to directly boot to a mapped device ... Browse Code »

Add a "create" module parameter, which allows device-mapper targets to
be configured at boot time. This enables early use of DM targets in the
boot process (as the root device or otherwise) without the need of an
initramfs.

The syntax used in the boot param is based on the concise format from
the dmsetup tool to follow the rule of least surprise:

dmsetup table --concise /dev/mapper/lroot

Which is:
dm-mod.create=,,,,[,+][;,,,,[,+]+]

Where,
::= The device name.
::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
::= The device minor number | ""
::= "ro" | "rw"
::=
::= "verity" | "linear" | ...

For example, the following could be added in the boot parameters:
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0

Only the targets that were tested are allowed and the ones that don't
change any block device when the device is create as read-only. For
example, mirror and cache targets are not allowed. The rationale behind
this is that if the user makes a mistake, choosing the wrong device to
be the mirror or the cache can corrupt data.

The only targets initially allowed are:
* crypt
* delay
* linear
* snapshot-origin
* striped
* verity

Co-developed-by: Will Drewry
Co-developed-by: Kees Cook
Co-developed-by: Enric Balletbo i Serra
Signed-off-by: Helen Koike
Reviewed-by: Kees Cook
Signed-off-by: Mike Snitzer

Helen Koike
2019-03-06 03:53:50 +0800

21 Nov, 2018

1 commit

806654a96 Documentation: Use "while" instead of "whilst" ... Browse Code »

Whilst making an unrelated change to some Documentation, Linus sayeth:

| Afaik, even in Britain, "whilst" is unusual and considered more
| formal, and "while" is the common word.
|
| [...]
|
| Can we just admit that we work with computers, and we don't need to
| use þe eald Englisc spelling of words that most of the world never
| uses?

dictionary.com refers to the word as "Chiefly British", which is
probably an undesirable attribute for technical documentation.

Replace all occurrences under Documentation/ with "while".

Cc: David Howells
Cc: Liam Girdwood
Cc: Chris Wilson
Cc: Michael Halcrow
Cc: Jonathan Corbet
Reported-by: Linus Torvalds
Signed-off-by: Will Deacon
Signed-off-by: Jonathan Corbet

Will Deacon
2018-11-21 00:30:43 +0800

25 Oct, 2018

1 commit

01aa9d518 Merge tag 'docs-4.20' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"This is a fairly typical cycle for documentation. There's some welcome
readability improvements for the formatted output, some LICENSES
updates including the addition of the ISC license, the removal of the
unloved and unmaintained 00-INDEX files, the deprecated APIs document
from Kees, more MM docs from Mike Rapoport, and the usual pile of typo
fixes and corrections"

* tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits)
docs: Fix typos in histogram.rst
docs: Introduce deprecated APIs list
kernel-doc: fix declaration type determination
doc: fix a typo in adding-syscalls.rst
docs/admin-guide: memory-hotplug: remove table of contents
doc: printk-formats: Remove bogus kobject references for device nodes
Documentation: preempt-locking: Use better example
dm flakey: Document "error_writes" feature
docs/completion.txt: Fix a couple of punctuation nits
LICENSES: Add ISC license text
LICENSES: Add note to CDDL-1.0 license that it should not be used
docs/core-api: memory-hotplug: add some details about locking internals
docs/core-api: rename memory-hotplug-notifier to memory-hotplug
docs: improve readability for people with poorer eyesight
yama: clarify ptrace_scope=2 in Yama documentation
docs/vm: split memory hotplug notifier description to Documentation/core-api
docs: move memory hotplug description into admin-guide/mm
doc: Fix acronym "FEKEK" in ecryptfs
docs: fix some broken documentation references
iommu: Fix passthrough option documentation
...

Linus Torvalds
2018-10-25 01:01:11 +0800

13 Oct, 2018

1 commit

0c6c987f3 dm flakey: Document "error_writes" feature ... Browse Code »

Commit ef548c551e72 ("dm flakey: introduce "error_writes" feature")
added the ability to dm flakey to error out writes in contrast to
silently dropping it with 'drop_writes'. Unfortunately this feature
is not currently documented and one has to be either familiar with the
source code of dm flakey or check out xfstests sources to know of
this parameter. So document it.

Signed-off-by: Nikolay Borisov
Signed-off-by: Jonathan Corbet

Nikolay Borisov
2018-10-13 01:31:39 +0800

04 Oct, 2018

1 commit

9305455ac block: Finish renaming REQ_DISCARD into REQ_OP_DISCARD ... Browse Code »

Some time ago REQ_DISCARD was renamed into REQ_OP_DISCARD. Some comments
and documentation files were not updated however. Update these comments
and documentation files. See also commit 4e1b2d52a80d ("block, fs,
drivers: remove REQ_OP compat defs and related code").

Signed-off-by: Bart Van Assche
Cc: Mike Christie
Cc: Martin K. Petersen
Cc: Philipp Reisner
Cc: Lars Ellenberg
Signed-off-by: Jens Axboe

Bart Van Assche
2018-10-04 06:12:28 +0800

07 Sep, 2018

1 commit

5380c05b6 dm raid: bump target version, update comments and documentation ... Browse Code »

Bump target version to reflect the documented fixes are available.
Also fix some code comments (typos and clarity).

Signed-off-by: Heinz Mauelshagen
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2018-09-07 05:07:58 +0800

30 Jul, 2018

1 commit

63c8ecb62 dm thin: include metadata_low_watermark threshold in pool status ... Browse Code »

The metadata low watermark threshold is set by the kernel. But the
kernel depends on userspace to extend the thinpool metadata device when
the threshold is crossed.

Since the metadata low watermark threshold is not visible to userspace,
upon receiving an event, userspace cannot tell that the kernel wants the
metadata device extended, instead of some other eventing condition.
Making it visible (but not settable) enables userspace to affirmatively
know the kernel is asking for a metadata device extension, by comparing
metadata_low_watermark against nr_free_blocks_metadata, also reported in
status.

Current solutions like dmeventd have their own thresholds for extending
the data and metadata devices, and both devices are checked against
their thresholds on each event. This lessens the value of the kernel-set
threshold, since userspace will either extend the metadata device sooner,
when receiving another event; or will receive the metadata lowater event
and do nothing, if dmeventd's threshold is less than the kernel's.
(This second case is dangerous. The metadata lowater event will not be
re-sent, so no further event will be generated before the metadata
device is out if space, unless some other event causes userspace to
recheck its thresholds.)

Signed-off-by: Andy Grover
Signed-off-by: Mike Snitzer

Andy Grover
2018-07-30 23:49:08 +0800

28 Jul, 2018

3 commits

a3fcf7253 dm integrity: recalculate checksums on creation ... Browse Code »

When using external metadata device and internal hash, recalculate the
checksums when the device is created - so that dm-integrity doesn't
have to overwrite the device. The superblock stores the last position
when the recalculation ended, so that it is properly restarted.

Integrity tags that haven't been recalculated yet are ignored.

Also bump the target version.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2018-07-28 03:24:27 +0800
cda6b5ab7 dm delay: add flush as a third class of IO ... Browse Code »

Add a new class for dm-delay that delays flush requests. Previously,
flushes were delayed as writes, but it caused problems if the user
needed to create a device with one or a few slow sectors for the purpose
of testing - all flushes would be forwarded to this device and delayed,
and that skews the test results. Fix this by allowing to select 0 delay
for flushes.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2018-07-28 03:24:19 +0800
6c7413c0f dm thin: update stale "Status" Documentation ... Browse Code »

Documentation/device-mapper-/thin-provisioning.txt's "Status" section no
longer reflected the current fitness level of DM thin-provisioning.
That is, DM thinp is no longer "EXPERIMENTAL". It has since seen
considerable improvement, has been fairly widely deployed and has
performed in a robust manner.

Update Documentation to dispel concern raised by potential DM thinp
users.

Reported-by: Drew Hastings
Signed-off-by: Mike Snitzer

Mike Snitzer
2018-07-28 03:24:03 +0800

03 Jul, 2018

1 commit

d284f8248 dm writecache: support optional offset for start of device ... Browse Code »

Add an optional parameter "start_sector" to allow the start of the
device to be offset by the specified number of 512-byte sectors. The
sectors below this offset are not used by the writecache device and are
left to be used for disk labels and/or userspace metadata (e.g. lvm).

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2018-07-03 04:14:02 +0800

08 Jun, 2018

1 commit

48debafe4 dm: add writecache target ... Browse Code »

The writecache target caches writes on persistent memory or SSD.
It is intended for databases or other programs that need extremely low
commit latency.

The writecache target doesn't cache reads because reads are supposed to
be cached in page cache in normal RAM.

If persistent memory isn't available this target can still be used in
SSD mode.

Signed-off-by: Mikulas Patocka
Signed-off-by: Colin Ian King # fix missing goto
Signed-off-by: Ross Zwisler # fix compilation issue with !DAX
Signed-off-by: Dan Carpenter # use msecs_to_jiffies
Acked-by: Dan Williams # reworks to unify ARM and x86 flushing
Signed-off-by: Mike Snitzer

Mikulas Patocka
2018-06-08 23:59:51 +0800

10 May, 2018

1 commit

28700a362 dm thin: update Documentation to clarify when "read_only" is valid ... Browse Code »

Due to user confusion, clarify that it doesn't make sense to try to
create a thin-pool with "read_only" mode enabled.

Signed-off-by: Mike Snitzer

Mike Snitzer
2018-05-10 23:18:49 +0800

04 Apr, 2018

1 commit

843f38d38 dm verity: add 'check_at_most_once' option to only validate hashes once ... Browse Code »

This allows platforms that are CPU/memory contrained to verify data
blocks only the first time they are read from the data device, rather
than every time. As such, it provides a reduced level of security
because only offline tampering of the data device's content will be
detected, not online tampering.

Hash blocks are still verified each time they are read from the hash
device, since verification of hash blocks is less performance critical
than data blocks, and a hash block will not be verified any more after
all the data blocks it covers have been verified anyway.

This option introduces a bitset that is used to check if a block has
been validated before or not. A block can be validated more than once
as there is no thread protection for the bitset.

These changes were developed and tested on entry-level Android Go
devices.

Signed-off-by: Patrik Torstensson
Signed-off-by: Mike Snitzer

Patrik Torstensson
2018-04-04 03:04:29 +0800

31 Jan, 2018

1 commit

9614e2ba9 dm cache: Documentation: update default migration_throttling value ... Browse Code »

In commit f8350daf7af0 ("dm cache: tune migration throttling") the
value for DEFAULT_MIGRATION_THRESHOLD was decreased from 204800 to
2048. Edit device-mapper/cache.txt to reflect the correct default
value for migration_threshold.

Signed-off-by: John Pittman
Signed-off-by: Mike Snitzer

John Pittman
2018-01-31 05:55:47 +0800

17 Jan, 2018

9 commits

7efd5fed6 dm thin: extend thinpool status format string with omitted fields ... Browse Code »

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:12 +0800
cc3ff0af1 dm thin: fixes in thin-provisioning.txt ... Browse Code »

Make the format string for thinpool status more correct.

Swap the order of two items to correspond with reality.

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:12 +0800
2bc8a61c6 dm thin: document representation of <highest mapped sector> when there is none ... Browse Code »

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:11 +0800
9b28a1102 dm thin: fix documentation relative to low water mark threshold ... Browse Code »

Fixes:
1. The use of "exceeds" when the opposite of exceeds, falls below,
was meant.
2. Properly speaking, a table can not exceed a threshold.

It emphasizes the important point, which is that it is the userspace
daemon's responsibility to check for low free space when a device
is resumed, since it won't get a special event indicating low free
space in that situation.

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:10 +0800
1346638e5 dm cache: be consistent in specifying sectors and SI units in cache.txt ... Browse Code »

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:09 +0800
3716e20af dm cache: delete obsoleted paragraph in cache.txt ... Browse Code »

The 'mq' policy is no longer the default policy, and the default policy,
'smq', does not store hit counts.

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:08 +0800
677210462 dm cache: fix grammar in cache-policies.txt ... Browse Code »

Use possessive pronoun where appropriate, instead of contraction.

Signed-off-by: mulhern
Signed-off-by: Mike Snitzer

mulhern
2018-01-17 22:16:07 +0800
424da29c5 dm snapshot: improve documentation relative to origin suspend requirements ... Browse Code »

Add a note to snapshot.txt that the origin target must be suspended when
loading or unloading the snapshot target.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2018-01-17 22:16:06 +0800
18a5bf270 dm: add unstriped target ... Browse Code »

This device mapper "unstriped" target remaps and unstripes I/O so it
is issued solely on a single drive in a HW RAID0 or dm-striped target.

In a 4 drive HW RAID0 the striped target exposes 1/4th of the LBA range
as a virtual drive. Each I/O to that virtual drive will only be issued
to the 1 drive that was selected of the 4 drives in the HW RAID0.

This unstriped target is most useful for Intel NVMe drives that have
multiple cores but that do not have firmware control to pin separate LBA
ranges to each discrete cpu core.

Signed-off-by: Scott Bauer
Signed-off-by: Heinz Mauelshagen
Acked-by: Keith Busch
Signed-off-by: Mike Snitzer

Scott Bauer
2018-01-17 22:16:00 +0800

14 Dec, 2017

1 commit

11e472320 dm raid: stop keeping raid set frozen altogether ... Browse Code »

In order to avoid redoing synchronization/recovery/reshape partially,
the raid set got frozen until after all passed in table line flags had
been cleared. The related table reload sequence had to be precisely
followed, or reshaping may lead to data corruption caused by the active
mapping carrying on with a reshape when the inactive mapping already
had retrieved a stale reshape position.

Harden by retrieving the actual resync/recovery/reshape position
during resume whilst the active table is suspended thus avoiding
to keep the raid set frozen altogether. This prevents superfluous
redoing of an already resynchronized or recovered segment and,
most importantly, potential for redoing of an already reshaped
segment causing data corruption.

Fixes: d39f0010e ("dm raid: fix raid_resume() to keep raid set frozen as needed")
Signed-off-by: Heinz Mauelshagen
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2017-12-14 00:52:02 +0800

08 Dec, 2017

1 commit

b84cf2692 dm raid: bump target version to reflect numerous fixes ... Browse Code »

Also update Documentation accordingly.

Signed-off-by: Mike Snitzer

Mike Snitzer
2017-12-08 23:59:58 +0800

06 Oct, 2017

1 commit

41dcf197a dm raid: fix incorrect status output at the end of a "recover" process ... Browse Code »

There are three important fields that indicate the overall health and
status of an array: dev_health, sync_ratio, and sync_action. They tell
us the condition of the devices in the array, and the degree to which
the array is synchronized.

This commit fixes a condition that is reported incorrectly. When a member
of the array is being rebuilt or a new device is added, the "recover"
process is used to synchronize it with the rest of the array. When the
process is complete, but the sync thread hasn't yet been reaped, it is
possible for the state of MD to be:
mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
curr_resync_completed = (but not MaxSector)
and all rdevs to be In_sync.
This causes the 'array_in_sync' output parameter that is passed to
rs_get_progress() to be computed incorrectly and reported as 'false' --
or not in-sync. This in turn causes the dev_health status characters to
be reported as all 'a', rather than the proper 'A'.

This can cause erroneous output for several seconds at a time when tools
will want to be checking the condition due to events that are raised at
the end of a sync process. Fix this by properly calculating the
'array_in_sync' return parameter in rs_get_progress().

Also, remove an unnecessary intermediate 'recovery_cp' variable in
rs_get_progress().

Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer

Jonathan Brassow
2017-10-06 04:21:30 +0800

26 Jul, 2017

1 commit

ac6a31888 dm raid: bump target version ... Browse Code »

Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c78c
("md: fix deadlock between mddev_suspend() and md_write_start()") is
available.

This version change allows userspace to detect that MD fix is available.

Signed-off-by: Heinz Mauelshagen
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2017-07-26 02:54:20 +0800

19 Jun, 2017

1 commit

3b1a94c88 dm zoned: drive-managed zoned block device target ... Browse Code »

The dm-zoned device mapper target provides transparent write access
to zoned block devices (ZBC and ZAC compliant block devices).
dm-zoned hides to the device user (a file system or an application
doing raw block device accesses) any constraint imposed on write
requests by the device, equivalent to a drive-managed zoned block
device model.

Write requests are processed using a combination of on-disk buffering
using the device conventional zones and direct in-place processing for
requests aligned to a zone sequential write pointer position.
A background reclaim process implemented using dm_kcopyd_copy ensures
that conventional zones are always available for executing unaligned
write requests. The reclaim process overhead is minimized by managing
buffer zones in a least-recently-written order and first targeting the
oldest buffer zones. Doing so, blocks under regular write access (such
as metadata blocks of a file system) remain stored in conventional
zones, resulting in no apparent overhead.

dm-zoned implementation focus on simplicity and on minimizing overhead
(CPU, memory and storage overhead). For a 14TB host-managed disk with
256 MB zones, dm-zoned memory usage per disk instance is at most about
3 MB and as little as 5 zones will be used internally for storing metadata
and performing buffer zone reclaim operations. This is achieved using
zone level indirection rather than a full block indirection system for
managing block movement between zones.

dm-zoned primary target is host-managed zoned block devices but it can
also be used with host-aware device models to mitigate potential
device-side performance degradation due to excessive random writing.

Zoned block devices can be formatted and checked for use with the dm-zoned
target using the dmzadm utility available at:

https://github.com/hgst/dm-zoned-tools

Signed-off-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Reviewed-by: Bart Van Assche
[Mike Snitzer partly refactored Damien's original work to cleanup the code]
Signed-off-by: Mike Snitzer

Damien Le Moal
2017-06-19 23:05:20 +0800

04 May, 2017

1 commit

d35a878ae Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- A major update for DM cache that reduces the latency for deciding
whether blocks should migrate to/from the cache. The bio-prison-v2
interface supports this improvement by enabling direct dispatch of
work to workqueues rather than having to delay the actual work
dispatch to the DM cache core. So the dm-cache policies are much more
nimble by being able to drive IO as they see fit. One immediate
benefit from the improved latency is a cache that should be much more
adaptive to changing workloads.

- Add a new DM integrity target that emulates a block device that has
additional per-sector tags that can be used for storing integrity
information.

- Add a new authenticated encryption feature to the DM crypt target
that builds on the capabilities provided by the DM integrity target.

- Add MD interface for switching the raid4/5/6 journal mode and update
the DM raid target to use it to enable aid4/5/6 journal write-back
support.

- Switch the DM verity target over to using the asynchronous hash
crypto API (this helps work better with architectures that have
access to off-CPU algorithm providers, which should reduce CPU
utilization).

- Various request-based DM and DM multipath fixes and improvements from
Bart and Christoph.

- A DM thinp target fix for a bio structure leak that occurs for each
discard IFF discard passdown is enabled.

- A fix for a possible deadlock in DM bufio and a fix to re-check the
new buffer allocation watermark in the face of competing admin
changes to the 'max_cache_size_bytes' tunable.

- A couple DM core cleanups.

* tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
dm bufio: check new buffer allocation watermark every 30 seconds
dm bufio: avoid a possible ABBA deadlock
dm mpath: make it easier to detect unintended I/O request flushes
dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
dm: introduce enum dm_queue_mode to cleanup related code
dm mpath: verify __pg_init_all_paths locking assumptions at runtime
dm: verify suspend_locking assumptions at runtime
dm block manager: remove an unused argument from dm_block_manager_create()
dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
dm mpath: delay requeuing while path initialization is in progress
dm mpath: avoid that path removal can trigger an infinite loop
dm mpath: split and rename activate_path() to prepare for its expanded use
dm ioctl: prevent stack leak in dm ioctl call
dm integrity: use previously calculated log2 of sectors_per_block
dm integrity: use hex2bin instead of open-coded variant
dm crypt: replace custom implementation of hex2bin()
dm crypt: remove obsolete references to per-CPU state
dm verity: switch to using asynchronous hash crypto API
dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
...

Linus Torvalds
2017-05-04 01:31:20 +0800

25 Apr, 2017

2 commits

9d609f85b dm integrity: support larger block sizes ... Browse Code »

The DM integrity block size can now be 512, 1k, 2k or 4k. Using larger
blocks reduces metadata handling overhead. The block size can be
configured at table load time using the "block_size:" option;
where is expressed in bytes (defult is still 512 bytes).

It is safe to use larger block sizes with DM integrity, because the
DM integrity journal makes sure that the whole block is updated
atomically even if the underlying device doesn't support atomic writes
of that size (e.g. 4k block ontop of a 512b device).

Depends-on: 2859323e ("block: fix blk_integrity_register to use template's interval_exp if not 0")
Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2017-04-25 00:04:33 +0800
56b67a4f2 dm integrity: various small changes and cleanups ... Browse Code »

Some coding style changes.

Fix a bug that the array test_tag has insufficient size if the digest
size of internal has is bigger than the tag size.

The function __fls is undefined for zero argument, this patch fixes
undefined behavior if the user sets zero interleave_sectors.

Fix the limit of optional arguments to 8.

Don't allocate crypt_data on the stack to avoid a BUG with debug kernel.

Rename all optional argument names to have underscores rather than
dashes.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2017-04-25 00:04:32 +0800

28 Mar, 2017

1 commit

6e53636fe dm raid: add raid4/5/6 journal write-back support via journal_mode option ... Browse Code »

Commit 63c32ed4afc ("dm raid: add raid4/5/6 journaling support") added
journal support to close the raid4/5/6 "write hole" -- in terms of
writethrough caching.

Introduce a "journal_mode" feature and use the new
r5c_journal_mode_set() API to add support for switching the journal
device's cache mode between write-through (the current default) and
write-back.

NOTE: If the journal device is not layered on resilent storage and it
fails, write-through mode will cause the "write hole" to reoccur. But
if the journal fails while in write-back mode it will cause data loss
for any dirty cache entries unless resilent storage is used for the
journal.

Signed-off-by: Heinz Mauelshagen
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2017-03-28 00:08:07 +0800

27 Mar, 2017

1 commit

4464e36e0 dm raid: fix table line argument order in status ... Browse Code »

Commit 3a1c1ef2f ("dm raid: enhance status interface and fixup
takeover/raid0") added new table line arguments and introduced an
ordering flaw. The sequence of the raid10_copies and raid10_format
raid parameters got reversed which causes lvm2 userspace to fail by
falsely assuming a changed table line.

Sequence those 2 parameters as before so that old lvm2 can function
properly with new kernels by adjusting the table line output as
documented in Documentation/device-mapper/dm-raid.txt.

Also, add missing version 1.10.1 highlight to the documention.

Fixes: 3a1c1ef2f ("dm raid: enhance status interface and fixup takeover/raid0")
Signed-off-by: Heinz Mauelshagen
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2017-03-27 23:45:26 +0800

25 Mar, 2017

5 commits

c2bcb2b70 dm integrity: add recovery mode ... Browse Code »

In recovery mode, we don't:
- replay the journal
- check checksums
- allow writes to the device

This mode can be used as a last resort for data recovery. The
motivation for recovery mode is that when there is a single error in the
journal, the user should not lose access to the whole device.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2017-03-25 03:54:23 +0800
8f0009a22 dm crypt: optionally support larger encryption sector size ... Browse Code »

Add optional "sector_size" parameter that specifies encryption sector
size (atomic unit of block device encryption).

Parameter can be in range 512 - 4096 bytes and must be power of two.
For compatibility reasons, the maximal IO must fit into the page limit,
so the limit is set to the minimal page size possible (4096 bytes).

NOTE: this device cannot yet be handled by cryptsetup if this parameter
is set.

IV for the sector is calculated from the 512 bytes sector offset unless
the iv_large_sectors option is used.

Test script using dmsetup:

DEV="/dev/sdb"
DEV_SIZE=$(blockdev --getsz $DEV)
KEY="9c1185a5c5e9fc54612808977ee8f548b2258d31ddadef707ba62c166051b9e3cd0294c27515f2bccee924e8823ca6e124b8fc3167ed478bca702babe4e130ac"
BLOCK_SIZE=4096

# dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE"
# dmsetup table --showkeys test_crypt

Signed-off-by: Milan Broz
Signed-off-by: Mike Snitzer

Milan Broz
2017-03-25 03:54:21 +0800
33d2f09fc dm crypt: introduce new format of cipher with "capi:" prefix ... Browse Code »

For the new authenticated encryption we have to support generic composed
modes (combination of encryption algorithm and authenticator) because
this is how the kernel crypto API accesses such algorithms.

To simplify the interface, we accept an algorithm directly in crypto API
format. The new format is recognised by the "capi:" prefix. The
dmcrypt internal IV specification is the same as for the old format.

The crypto API cipher specifications format is:
capi:cipher_api_spec-ivmode[:ivopts]
Examples:
capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256)
capi:xts(aes)-plain64 (equivalent to old aes-xts-plain64)
Examples of authenticated modes:
capi:gcm(aes)-random
capi:authenc(hmac(sha256),xts(aes))-random
capi:rfc7539(chacha20,poly1305)-random

Authenticated modes can only be configured using the new cipher format.
Note that this format allows user to specify arbitrary combinations that
can be insecure. (Policy decision is done in cryptsetup userspace.)

Authenticated encryption algorithms can be of two types, either native
modes (like GCM) that performs both encryption and authentication
internally, or composed modes where user can compose AEAD with separate
specification of encryption algorithm and authenticator.

For composed mode with HMAC (length-preserving encryption mode like an
XTS and HMAC as an authenticator) we have to calculate HMAC digest size
(the separate authentication key is the same size as the HMAC digest).
Introduce crypt_ctr_auth_cipher() to parse the crypto API string to get
HMAC algorithm and retrieve digest size from it.

Also, for HMAC composed mode we need to parse the crypto API string to
get the cipher mode nested in the specification. For native AEAD mode
(like GCM), we can use crypto_tfm_alg_name() API to get the cipher
specification.

Because the HMAC composed mode is not processed the same as the native
AEAD mode, the CRYPT_MODE_INTEGRITY_HMAC flag is no longer needed and
"hmac" specification for the table integrity argument is removed.

Signed-off-by: Milan Broz
Signed-off-by: Mike Snitzer

Milan Broz
2017-03-25 03:54:20 +0800
ef43aa380 dm crypt: add cryptographic data integrity protection (authenticated encryption) ... Browse Code »

Allow the use of per-sector metadata, provided by the dm-integrity
module, for integrity protection and persistently stored per-sector
Initialization Vector (IV). The underlying device must support the
"DM-DIF-EXT-TAG" dm-integrity profile.

The per-bio integrity metadata is allocated by dm-crypt for every bio.

Example of low-level mapping table for various types of use:
DEV=/dev/sdb
SIZE=417792

# Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key
SIZE_INT=389952
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"

# AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs
# GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space)
SIZE_INT=393024
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
0 /dev/mapper/x 0 1 integrity:28:aead"

# Random IV only for XTS mode (no integrity protection but provides atomic random sector change)
SIZE_INT=401272
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
0 /dev/mapper/x 0 1 integrity:16:none"

# Random IV with XTS + HMAC integrity protection
SIZE_INT=377656
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"

Both AEAD and HMAC protection authenticates not only data but also
sector metadata.

HMAC protection is implemented through autenc wrapper (so it is
processed the same way as an authenticated mode).

In HMAC mode there are two keys (concatenated in dm-crypt mapping
table). First is the encryption key and the second is the key for
authentication (HMAC). (It is userspace decision if these keys are
independent or somehow derived.)

The sector request for AEAD/HMAC authenticated encryption looks like this:
|----- AAD -------|------ DATA -------|-- AUTH TAG --|
| (authenticated) | (auth+encryption) | |
| sector_LE | IV | sector in/out | tag in/out |

For writes, the integrity fields are calculated during AEAD encryption
of every sector and stored in bio integrity fields and sent to
underlying dm-integrity target for storage.

For reads, the integrity metadata is verified during AEAD decryption of
every sector (they are filled in by dm-integrity, but the integrity
fields are pre-allocated in dm-crypt).

There is also an experimental support in cryptsetup utility for more
friendly configuration (part of LUKS2 format).

Because the integrity fields are not valid on initial creation, the
device must be "formatted". This can be done by direct-io writes to the
device (e.g. dd in direct-io mode). For now, there is available trivial
tool to do this, see: https://github.com/mbroz/dm_int_tools

Signed-off-by: Milan Broz
Signed-off-by: Ondrej Mosnacek
Signed-off-by: Vashek Matyas
Signed-off-by: Mike Snitzer

Milan Broz
2017-03-25 03:49:41 +0800
7eada909b dm: add integrity target ... Browse Code »

The dm-integrity target emulates a block device that has additional
per-sector tags that can be used for storing integrity information.

A general problem with storing integrity tags with every sector is that
writing the sector and the integrity tag must be atomic - i.e. in case of
crash, either both sector and integrity tag or none of them is written.

To guarantee write atomicity the dm-integrity target uses a journal. It
writes sector data and integrity tags into a journal, commits the journal
and then copies the data and integrity tags to their respective location.

The dm-integrity target can be used with the dm-crypt target - in this
situation the dm-crypt target creates the integrity data and passes them
to the dm-integrity target via bio_integrity_payload attached to the bio.
In this mode, the dm-crypt and dm-integrity targets provide authenticated
disk encryption - if the attacker modifies the encrypted device, an I/O
error is returned instead of random data.

The dm-integrity target can also be used as a standalone target, in this
mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.

Signed-off-by: Mikulas Patocka
Signed-off-by: Milan Broz
Signed-off-by: Mike Snitzer

Mikulas Patocka
2017-03-25 03:49:07 +0800