Eric Lee / smarc-fsl-linux-kernel

03 Aug, 2020

1 commit

01b5d32a5 raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show ... Browse Code »

Replace mddev_lock with spin_lock to align with other show methods in
raid5_attrs.

Signed-off-by: Guoqing Jiang
Signed-off-by: Song Liu

Guoqing Jiang
2020-08-03 14:03:52 +0800

22 Jul, 2020

1 commit

c911c46c0 md/raid456: convert macro STRIPE_* to RAID5_STRIPE_* ... Browse Code »

Convert macro STRIPE_SIZE, STRIPE_SECTORS and STRIPE_SHIFT to
RAID5_STRIPE_SIZE(), RAID5_STRIPE_SECTORS() and RAID5_STRIPE_SHIFT().

This patch is prepare for the following adjustable stripe_size.
It will not change any existing functionality.

Signed-off-by: Yufen Yu
Signed-off-by: Song Liu

Yufen Yu
2020-07-22 08:18:12 +0800

16 Jul, 2020

1 commit

52923083b md: raid5-cache: Remove set but unused variable ... Browse Code »

Remove the variable offset in r5c_tree_index() to avoid a "set but not
used" compilation warning when compiling with W=1.

Signed-off-by: Damien Le Moal
Signed-off-by: Song Liu

Damien Le Moal
2020-07-16 13:46:07 +0800

15 Jul, 2020

1 commit

c9020e64c md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes ... Browse Code »

In recovery, if we process too much data, raid5-cache may set
MD_SB_CHANGE_PENDING, which causes spinning in handle_stripe().
Fix this issue by clearing the bit before flushing data only
stripes. This issue was initially discussed in [1].

[1] https://www.spinics.net/lists/raid/msg64409.html

Signed-off-by: Song Liu

Song Liu
2020-07-15 14:01:31 +0800

05 Jun, 2019

1 commit

2025cf9e1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Alexios Zavras
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:36:37 +0800

29 Jan, 2019

1 commit

483cbbedd md/raid5: fix 'out of memory' during raid cache recovery ... Browse Code »

This fixes the case when md array assembly fails because of raid cache recovery
unable to allocate a stripe, despite attempts to replay stripes and increase
cache size. This happens because stripes released by r5c_recovery_replay_stripes
and raid5_set_cache_size don't become available for allocation immediately.
Released stripes first are placed on conf->released_stripes list and require
md thread to merge them on conf->inactive_list before they can be allocated.

Patch allows final allocation attempt during cache recovery to wait for
new stripes to become availabe for allocation.

Cc: linux-raid@vger.kernel.org
Cc: Shaohua Li
Cc: linux-stable # 4.10+
Fixes: b4c625c67362 ("md/r5cache: r5cache recovery: part 1")
Signed-off-by: Alexei Naberezhnov
Signed-off-by: Song Liu

Alexei Naberezhnov
2019-01-29 03:44:40 +0800

11 Oct, 2018

1 commit

116d99adf md: remove redundant code that is no longer reachable ... Browse Code »

And earlier commit removed the error label to two statements that
are now never reachable. Since this code is now dead code, remove it.

Detected by CoverityScan, CID#1462409 ("Structurally dead code")

Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Signed-off-by: Colin Ian King
Signed-off-by: Shaohua Li

Colin Ian King
2018-10-11 01:45:15 +0800

19 Aug, 2018

1 commit

08b5fa819 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input ... Browse Code »

Pull input updates from Dmitry Torokhov:

- a new driver for Rohm BU21029 touch controller

- new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free

- updates to Atmel, eeti. pxrc and iforce drivers

- assorted driver cleanups and fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits)
MAINTAINERS: Add PhoenixRC Flight Controller Adapter
Input: do not use WARN() in input_alloc_absinfo()
Input: mark expected switch fall-throughs
Input: raydium_i2c_ts - use true and false for boolean values
Input: evdev - switch to bitmap API
Input: gpio-keys - switch to bitmap_zalloc()
Input: elan_i2c_smbus - cast sizeof to int for comparison
bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
md: Avoid namespace collision with bitmap API
dm: Avoid namespace collision with bitmap API
Input: pm8941-pwrkey - add resin entry
Input: pm8941-pwrkey - abstract register offsets and event code
Input: iforce - reorganize joystick configuration lists
Input: atmel_mxt_ts - move completion to after config crc is updated
Input: atmel_mxt_ts - don't report zero pressure from T9
Input: atmel_mxt_ts - zero terminate config firmware file
Input: atmel_mxt_ts - refactor config update code to add context struct
Input: atmel_mxt_ts - config CRC may start at T71
Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM
Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE
...

Linus Torvalds
2018-08-19 07:48:07 +0800

02 Aug, 2018

1 commit

e64e4018d md: Avoid namespace collision with bitmap API ... Browse Code »

bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

On the other hand MD bitmap API is special case.
Adding 'md' prefix to it to avoid name space collision.

No functional changes intended.

Signed-off-by: Andy Shevchenko
Acked-by: Shaohua Li
Signed-off-by: Dmitry Torokhov

Andy Shevchenko
2018-08-02 06:49:39 +0800

06 Jul, 2018

1 commit

ebc7709f6 md/r5cache: remove redundant pointer bio ... Browse Code »

Pointer bio is being assigned but is never used hence it is redundant
and can be removed.

Cleans up clang warning:
warning: variable 'bio' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King
Signed-off-by: Shaohua Li

Colin Ian King
2018-07-06 02:17:02 +0800

31 May, 2018

1 commit

afeee514c md: convert to bioset_init()/mempool_init() ... Browse Code »

Convert md to embedded bio sets.

Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800

16 Jan, 2018

1 commit

1532d9e87 raid5-ppl: PPL support for disks with write-back cache enabled ... Browse Code »

In order to provide data consistency with PPL for disks with write-back
cache enabled all data has to be flushed to disks before next PPL
entry. The disks to be flushed are marked in the bitmap. It's modified
under a mutex and it's only read after PPL io unit is submitted.

A limitation of 64 disks in the array has been introduced to keep data
structures and implementation simple. RAID5 arrays with so many disks are
not likely due to high risk of multiple disks failure. Such restriction
should not be a real life limitation.

With write-back cache disabled next PPL entry is submitted when data write
for current one completes. Data flush defers next log submission so trigger
it when there are no stripes for handling found.

As PPL assures all data is flushed to disk at request completion, just
acknowledge flush request when PPL is enabled.

Signed-off-by: Tomasz Majchrzak
Signed-off-by: Shaohua Li

Tomasz Majchrzak
2018-01-16 06:29:42 +0800

21 Dec, 2017

1 commit

92e6245de md/r5cache: print more info of log recovery ... Browse Code »

Log recovery is critical for raid5 journal/cache. Printing information
about each recovery by default will help the system admin monitor the
status of the array.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-12-21 00:39:26 +0800

12 Dec, 2017

1 commit

d5d885fd5 md: introduce new personality funciton start() ... Browse Code »

In do_md_run(), md threads should not wake up until the array is fully
initialized in md_run(). However, in raid5_run(), raid5-cache may wake
up mddev->thread to flush stripes that need to be written back. This
design doesn't break badly right now. But it could lead to bad bug in
the future.

This patch tries to resolve this problem by splitting start up work
into two personality functions, run() and start(). Tasks that do not
require the md threads should go into run(), while task that require
the md threads go into start().

r5l_load_log() is moved to raid5_start(), so it is not called until
the md threads are started in do_md_run().

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-12-12 00:52:34 +0800

02 Dec, 2017

1 commit

ff35f58e8 md/r5cache: move mddev_lock() out of r5c_journal_mode_set() ... Browse Code »

r5c_journal_mode_set() is called by r5c_journal_mode_store() and
raid_ctr() in dm-raid. We don't need mddev_lock() when calling from
raid_ctr(). This patch fixes this by moves the mddev_lock() to
r5c_journal_mode_store().

Cc: stable@vger.kernel.org (v4.13+)
Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-12-02 03:27:32 +0800

02 Nov, 2017

3 commits

efa4b77b0 md: use lockdep_assert_held ... Browse Code »

lockdep_assert_held is a better way to assert lock held, and it works
for UP.

Signed-off-by: Shaohua Li

Shaohua Li
2017-11-02 12:32:22 +0800
b03e0ccb5 md: remove special meaning of ->quiesce(.., 2) ... Browse Code »

The '2' argument means "wake up anything that is waiting".
This is an inelegant part of the design and was added
to help support management of suspend_lo/suspend_hi setting.
Now that suspend_lo/hi is managed in mddev_suspend/resume,
that need is gone.
These is still a couple of places where we call 'quiesce'
with an argument of '2', but they can safely be changed to
call ->quiesce(.., 1); ->quiesce(.., 0) which
achieve the same result at the small cost of pausing IO
briefly.

This removes a small "optimization" from suspend_{hi,lo}_store,
but it isn't clear that optimization served a useful purpose.
The code now is a lot clearer.

Suggested-by: Shaohua Li
Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2017-11-02 12:32:20 +0800
4d5324f76 md: always hold reconfig_mutex when calling mddev_suspend() ... Browse Code »

Most often mddev_suspend() is called with
reconfig_mutex held. Make this a requirement in
preparation a subsequent patch. Also require
reconfig_mutex to be held for mddev_resume(),
partly for symmetry and partly to guarantee
no races with incr/decr of mddev->suspend.

Taking the mutex in r5c_disable_writeback_async() is
a little tricky as this is called from a work queue
via log->disable_writeback_work, and flush_work()
is called on that while holding ->reconfig_mutex.
If the work item hasn't run before flush_work()
is called, the work function will not be able to
get the mutex.

So we use mddev_trylock() inside the wait_event() call, and have that
abort when conf->log is set to NULL, which happens before
flush_work() is called.
We wait in mddev->sb_wait and ensure this is woken
when any of the conditions change. This requires
waking mddev->sb_wait in mddev_unlock(). This is only
like to trigger extra wake_ups of threads that needn't
be woken when metadata is being written, and that
doesn't happen often enough that the cost would be
noticeable.

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2017-11-02 12:32:18 +0800

17 Oct, 2017

1 commit

935fe0983 md: rename some drivers/md/ files to have an "md-" prefix ... Browse Code »

Motivated by the desire to illiminate the imprecise nature of
DM-specific patches being unnecessarily sent to both the MD maintainer
and mailing-list. Which is born out of the fact that DM files also
reside in drivers/md/

Now all MD-specific files in drivers/md/ start with either "raid" or
"md-" and the MAINTAINERS file has been updated accordingly.

Shaohua: don't change module name

Signed-off-by: Mike Snitzer
Signed-off-by: Shaohua Li

Mike Snitzer
2017-10-17 10:06:36 +0800

08 Sep, 2017

2 commits

3645e6d0d Merge tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md ... Browse Code »

Pull MD updates from Shaohua Li:
"This update mainly fixes bugs:

- Make raid5 ppl support several ppl from Pawel

- Several raid5-cache bug fixes from Song

- Bitmap fixes from Neil and Me

- One raid1/10 regression fix since 4.12 from Me

- Other small fixes and cleanup"

* tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: disable bitmap_resize for file-backed bitmaps.
raid5-ppl: Recovery support for multiple partial parity logs
md: Runtime support for multiple ppls
md/raid0: attach correct cgroup info in bio
lib/raid6: align AVX512 constants to 512 bits, not bytes
raid5: remove raid5_build_block
md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
md: replace seq_release_private with seq_release
md: notify about new spare disk in the container
md/raid1/10: reset bio allocated from mempool
md/raid5: release/flush io in raid5_do_work()
md/bitmap: copy correct data for bitmap super

Linus Torvalds
2017-09-08 03:41:48 +0800
a0725ab0c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:

- Fix for a registration race in loop, from Anton Volkov.

- Overflow complaint fix from Arnd for DAC960.

- Series of drbd changes from the usual suspects.

- Conversion of the stec/skd driver to blk-mq. From Bart.

- A few BFQ improvements/fixes from Paolo.

- CFQ improvement from Ritesh, allowing idling for group idle.

- A few fixes found by Dan's smatch, courtesy of Dan.

- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.

- A few nbd fixes from Josef.

- Support for cgroup info in blktrace, from Shaohua.

- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.

- Various corner cases and error handling fixes from Weiping Zhang.

- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.

- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.

- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...

Linus Torvalds
2017-09-08 02:59:42 +0800

26 Aug, 2017

1 commit

a72cbf83b md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show ... Browse Code »

In r5c_journal_mode_show(), it is necessary to call mddev_lock()
before accessing conf and conf->log. Otherwise, the conf->log
may change (and become NULL).

Signed-off-by: Song Liu
Reported-by: Stephen Rothwell
Reported-by: kbuild test robot
Signed-off-by: Shaohua Li

Song Liu
2017-08-26 01:21:46 +0800

24 Aug, 2017

1 commit

74d46992e block: replace bi_bdev with a gendisk pointer and partitions index ... Browse Code »

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-08-24 02:49:55 +0800

08 Aug, 2017

2 commits

a9501d742 md/r5cache: fix io_unit handling in r5l_log_endio() ... Browse Code »

In r5l_log_endio(), once log->io_list_lock is released, the io unit
may be accessed (or even freed) by other threads. Current code
doesn't handle the io_unit properly, which leads to potential race
conditions.

This patch solves this race condition by:

1. Add a pending_stripe count flush_payload. Multiple flush_payloads
are counted as only one pending_stripe. Flag has_flush_payload is
added to show whether the io unit has flush_payload;
2. In r5l_log_endio(), check flags has_null_flush and
has_flush_payload with log->io_list_lock held. After the lock
is released, this IO unit is only accessed when we know the
pending_stripe counter cannot be zeroed by other threads.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-08-08 22:42:37 +0800
b44886c54 md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set ... Browse Code »

In r5c_journal_mode_set(), it is necessary to call mddev_lock()
before accessing conf and conf->log. Otherwise, the conf->log
may change (and become NULL).

Shaohua: fix unlock in failure cases

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-08-08 22:42:36 +0800

19 Jun, 2017

1 commit

011067b05 blk: replace bioset_create_nobvec() with a flags arg to bioset_create() ... Browse Code »

"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().

Suggested-by: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2017-06-19 02:40:59 +0800

12 Jun, 2017

1 commit

8f66439ee Merge tag 'v4.12-rc5' into for-4.13/block ... Browse Code »

We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.

Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.

Signed-off-by: Jens Axboe

Jens Axboe
2017-06-12 22:30:13 +0800

09 Jun, 2017

1 commit

4e4cbee93 block: switch bios to blk_status_t ... Browse Code »

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-09 23:27:32 +0800

01 Jun, 2017

1 commit

5a8948f8a md: Make flush bios explicitely sync ... Browse Code »

Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions. generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

CC: linux-raid@vger.kernel.org
CC: Shaohua Li
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Shaohua Li

Jan Kara
2017-06-01 00:25:53 +0800

12 May, 2017

2 commits

5ddf0440a md/r5cache: handle sync with data in write back cache ... Browse Code »

Currently, sync of raid456 array cannot make progress when hitting
data in writeback r5cache.

This patch fixes this issue by flushing cached data of the stripe
before processing the sync request. This is achived by:

1. In handle_stripe(), do not set STRIPE_SYNCING if the stripe is
in write back cache;
2. In r5c_try_caching_write(), handle the stripe in sync with write
through;
3. In do_release_stripe(), make stripe in sync write out and send
it to the state machine.

Shaohua: explictly set STRIPE_HANDLE after write out completed

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-05-12 13:14:40 +0800
70d466f76 md/r5cache: gracefully handle journal device errors for writeback mode ... Browse Code »

For the raid456 with writeback cache, when journal device failed during
normal operation, it is still possible to persist all data, as all
pending data is still in stripe cache. However, it is necessary to handle
journal failure gracefully.

During journal failures, the following logic handles the graceful shutdown
of journal:
1. raid5_error() marks the device as Faulty and schedules async work
log->disable_writeback_work;
2. In disable_writeback_work (r5c_disable_writeback_async), the mddev is
suspended, set to write through, and then resumed. mddev_suspend()
flushes all cached stripes;
3. All cached stripes need to be flushed carefully to the RAID array.

This patch fixes issues within the process above:
1. In r5c_update_on_rdev_error() schedule disable_writeback_work for
journal failures;
2. In r5c_disable_writeback_async(), wait for MD_SB_CHANGE_PENDING,
since raid5_error() updates superblock.
3. In handle_stripe(), allow stripes with data in journal (s.injournal > 0)
to make progress during log_failed;
4. In delay_towrite(), if log failed only process data in the cache (skip
new writes in dev->towrite);
5. In __get_priority_stripe(), process loprio_list during journal device
failures.
6. In raid5_remove_disk(), wait for all cached stripes are flushed before
calling log_exit().

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-05-12 13:11:11 +0800

11 May, 2017

1 commit

bb3338d34 md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first ... Browse Code »

In r5l_do_submit_io(), it is necessary to check io->split_bio before
submit io->current_bio. This is because, endio of current_bio may
free the whole IO unit, and thus change io->split_bio.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-05-11 01:07:55 +0800

04 May, 2017

1 commit

d35a878ae Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- A major update for DM cache that reduces the latency for deciding
whether blocks should migrate to/from the cache. The bio-prison-v2
interface supports this improvement by enabling direct dispatch of
work to workqueues rather than having to delay the actual work
dispatch to the DM cache core. So the dm-cache policies are much more
nimble by being able to drive IO as they see fit. One immediate
benefit from the improved latency is a cache that should be much more
adaptive to changing workloads.

- Add a new DM integrity target that emulates a block device that has
additional per-sector tags that can be used for storing integrity
information.

- Add a new authenticated encryption feature to the DM crypt target
that builds on the capabilities provided by the DM integrity target.

- Add MD interface for switching the raid4/5/6 journal mode and update
the DM raid target to use it to enable aid4/5/6 journal write-back
support.

- Switch the DM verity target over to using the asynchronous hash
crypto API (this helps work better with architectures that have
access to off-CPU algorithm providers, which should reduce CPU
utilization).

- Various request-based DM and DM multipath fixes and improvements from
Bart and Christoph.

- A DM thinp target fix for a bio structure leak that occurs for each
discard IFF discard passdown is enabled.

- A fix for a possible deadlock in DM bufio and a fix to re-check the
new buffer allocation watermark in the face of competing admin
changes to the 'max_cache_size_bytes' tunable.

- A couple DM core cleanups.

* tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
dm bufio: check new buffer allocation watermark every 30 seconds
dm bufio: avoid a possible ABBA deadlock
dm mpath: make it easier to detect unintended I/O request flushes
dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
dm: introduce enum dm_queue_mode to cleanup related code
dm mpath: verify __pg_init_all_paths locking assumptions at runtime
dm: verify suspend_locking assumptions at runtime
dm block manager: remove an unused argument from dm_block_manager_create()
dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
dm mpath: delay requeuing while path initialization is in progress
dm mpath: avoid that path removal can trigger an infinite loop
dm mpath: split and rename activate_path() to prepare for its expanded use
dm ioctl: prevent stack leak in dm ioctl call
dm integrity: use previously calculated log2 of sectors_per_block
dm integrity: use hex2bin instead of open-coded variant
dm crypt: replace custom implementation of hex2bin()
dm crypt: remove obsolete references to per-CPU state
dm verity: switch to using asynchronous hash crypto API
dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
...

Linus Torvalds
2017-05-04 01:31:20 +0800

27 Mar, 2017

1 commit

78e470c26 md: add raid4/5/6 journal mode switching API ... Browse Code »

Commit 2ded370373a4 ("md/r5cache: State machine for raid5-cache write
back mode") added support for "write-back" caching on the raid journal
device.

In order to allow the dm-raid target to switch between the available
"write-through" and "write-back" modes, provide a new
r5c_journal_mode_set() API.

Use the new API in existing r5c_journal_mode_store()

Signed-off-by: Heinz Mauelshagen
Acked-by: Shaohua Li
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2017-03-27 23:13:47 +0800

26 Mar, 2017

1 commit

1ad45a9bc md/raid5-cache: fix payload endianness problem in raid5-cache ... Browse Code »

The payload->header.type and payload->size are little-endian, so just
convert them to the right byte order.

Signed-off-by: Jason Yan
Cc: #v4.10+
Signed-off-by: Shaohua Li

Jason Yan
2017-03-26 00:38:22 +0800

23 Mar, 2017

3 commits

016c76ac7 md/raid5: use bio_inc_remaining() instead of repurposing bi_phys_segments as a counter ... Browse Code »

md/raid5 needs to keep track of how many stripe_heads are processing a
bio so that it can delay calling bio_endio() until all stripe_heads
have completed. It currently uses 16 bits of ->bi_phys_segments for
this purpose.

16 bits is only enough for 256M requests, and it is possible for a
single bio to be larger than this, which causes problems. Also, the
bio struct contains a larger counter, __bi_remaining, which has a
purpose very similar to the purpose of our counter. So stop using
->bi_phys_segments, and instead use __bi_remaining.

This means we don't need to initialize the counter, as our caller
initializes it to '1'. It also means we can call bio_endio() directly
as it tests this counter internally.

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2017-03-23 10:16:30 +0800
bd83d0a28 md/raid5: call bio_endio() directly rather than queueing for later. ... Browse Code »

We currently gather bios that need to be returned into a bio_list
and call bio_endio() on them all together.
The original reason for this was to avoid making the calls while
holding a spinlock.
Locking has changed a lot since then, and that reason is no longer
valid.

So discard return_io() and various return_bi lists, and just call
bio_endio() directly as needed.

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2017-03-23 10:16:12 +0800
497280509 md/raid5: use md_write_start to count stripes, not bios ... Browse Code »

We use md_write_start() to increase the count of pending writes, and
md_write_end() to decrement the count. We currently count bios
submitted to md/raid5. Change it count stripe_heads that a WRITE bio
has been attached to.

So now, raid5_make_request() calls md_write_start() and then
md_write_end() to keep the count elevated during the setup of the
request.

add_stripe_bio() calls md_write_start() for each stripe_head, and the
completion routines always call md_write_end(), instead of only
calling it when raid5_dec_bi_active_stripes() returns 0.
make_discard_request also calls md_write_start/end().

The parallel between md_write_{start,end} and use of bi_phys_segments
can be seen in that:
Whenever we set bi_phys_segments to 1, we now call md_write_start.
Whenever we increment it on non-read requests with
raid5_inc_bi_active_stripes(), we now call md_write_start().
Whenever we decrement bi_phys_segments on non-read requsts with
raid5_dec_bi_active_stripes(), we now call md_write_end().

This reduces our dependence on keeping a per-bio count of active
stripes in bi_phys_segments.

md_write_inc() is added which parallels md_write_start(), but requires
that a write has already been started, and is certain never to sleep.
This can be used inside a spinlocked region when adding to a write
request.

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2017-03-23 10:15:42 +0800

17 Mar, 2017

2 commits

ea17481fb md/r5cache: generate R5LOG_PAYLOAD_FLUSH ... Browse Code »

In r5c_finish_stripe_write_out(), R5LOG_PAYLOAD_FLUSH is append to
log->current_io.

Appending R5LOG_PAYLOAD_FLUSH in quiesce needs extra writes to
journal. To simplify the logic, we just skip R5LOG_PAYLOAD_FLUSH in
quiesce.

Even R5LOG_PAYLOAD_FLUSH supports multiple stripes per payload.
However, current implementation is one stripe per R5LOG_PAYLOAD_FLUSH,
which is simpler.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-03-17 07:55:57 +0800
2d4f46875 md/r5cache: handle R5LOG_PAYLOAD_FLUSH in recovery ... Browse Code »

This patch adds handling of R5LOG_PAYLOAD_FLUSH in journal recovery.
Next patch will add logic that generate R5LOG_PAYLOAD_FLUSH on flush
finish.

When R5LOG_PAYLOAD_FLUSH is seen in recovery, pending data and parity
will be dropped from recovery. This will reduce the number of stripes
to replay, and thus accelerate the recovery process.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2017-03-17 07:55:57 +0800