Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2017

1 commit

8beb252f2 md/raid5: limit request size according to implementation limits ... Browse Code »

commit e8d7c33232e5fdfa761c3416539bc5b4acd12db5 upstream.

Current implementation employ 16bit counter of active stripes in lower
bits of bio->bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov
Cc: Shaohua Li
Cc: Neil Brown
Signed-off-by: Shaohua Li
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2017-01-09 15:32:22 +0800

08 Oct, 2016

1 commit

c23112e03 Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md ... Browse Code »

Pull MD updates from Shaohua Li:
"This update includes:

- new AVX512 instruction based raid6 gen/recovery algorithm

- a couple of md-cluster related bug fixes

- fix a potential deadlock

- set nonrotational bit for raid array with SSD

- set correct max_hw_sectors for raid5/6, which hopefuly can improve
performance a little bit

- other minor fixes"

* tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md: set rotational bit
raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
raid5: handle register_shrinker failure
raid5: fix to detect failure of register_shrinker
md: fix a potential deadlock
md/bitmap: fix wrong cleanup
raid5: allow arbitrary max_hw_sectors
lib/raid6: Add AVX512 optimized xor_syndrome functions
lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
lib/raid6: Add AVX512 optimized recovery functions
lib/raid6: Add AVX512 optimized gen_syndrome functions
md-cluster: make resync lock also could be interruptted
md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
md-cluster: convert the completion to wait queue
md-cluster: protect md_find_rdev_nr_rcu with rcu lock
md-cluster: clean related infos of cluster
md: changes for MD_STILL_CLOSED flag
md-cluster: remove some unnecessary dlm_unlock_sync
md-cluster: use FORCEUNLOCK in lockres_free
md-cluster: call md_kick_rdev_from_array once ack failed

Linus Torvalds
2016-10-08 00:45:43 +0800

04 Oct, 2016

1 commit

597f03f9d Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull CPU hotplug updates from Thomas Gleixner:
"Yet another batch of cpu hotplug core updates and conversions:

- Provide core infrastructure for multi instance drivers so the
drivers do not have to keep custom lists.

- Convert custom lists to the new infrastructure. The block-mq custom
list conversion comes through the block tree and makes the diffstat
tip over to more lines removed than added.

- Handle unbalanced hotplug enable/disable calls more gracefully.

- Remove the obsolete CPU_STARTING/DYING notifier support.

- Convert another batch of notifier users.

The relayfs changes which conflicted with the conversion have been
shipped to me by Andrew.

The remaining lot is targeted for 4.10 so that we finally can remove
the rest of the notifiers"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
cpufreq: Fix up conversion to hotplug state machine
blk/mq: Reserve hotplug states for block multiqueue
x86/apic/uv: Convert to hotplug state machine
s390/mm/pfault: Convert to hotplug state machine
mips/loongson/smp: Convert to hotplug state machine
mips/octeon/smp: Convert to hotplug state machine
fault-injection/cpu: Convert to hotplug state machine
padata: Convert to hotplug state machine
cpufreq: Convert to hotplug state machine
ACPI/processor: Convert to hotplug state machine
virtio scsi: Convert to hotplug state machine
oprofile/timer: Convert to hotplug state machine
block/softirq: Convert to hotplug state machine
lib/irq_poll: Convert to hotplug state machine
x86/microcode: Convert to hotplug state machine
sh/SH-X3 SMP: Convert to hotplug state machine
ia64/mca: Convert to hotplug state machine
ARM/OMAP/wakeupgen: Convert to hotplug state machine
ARM/shmobile: Convert to hotplug state machine
arm64/FP/SIMD: Convert to hotplug state machine
...

Linus Torvalds
2016-10-04 10:43:08 +0800

22 Sep, 2016

3 commits

30c894656 raid5: handle register_shrinker failure ... Browse Code »

register_shrinker() now can fail. When it happens, shrinker.nr_deferred is
null. We use it to determine if unregister_shrinker is required.

Signed-off-by: Shaohua Li

Shaohua Li
2016-09-22 00:09:44 +0800
6a0f53ff3 raid5: fix to detect failure of register_shrinker ... Browse Code »

register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after raid5 configuration was setup successfully.

Signed-off-by: Chao Yu
Signed-off-by: Shaohua Li

Chao Yu
2016-09-22 00:09:44 +0800
1dffddddd raid5: allow arbitrary max_hw_sectors ... Browse Code »

raid5 will split bio to proper size internally, there is no point to use
underlayer disk's max_hw_sectors. In my qemu system, without the change,
the raid5 only receives 128k size bio, which reduces the chance of bio
merge sending to underlayer disks.

Signed-off-by: Shaohua Li

Shaohua Li
2016-09-22 00:09:44 +0800

10 Sep, 2016

1 commit

c94455558 raid5: fix a small race condition ... Browse Code »

commit 5f9d1fde7d54a5(raid5: fix memory leak of bio integrity data)
moves bio_reset to bio_endio. But it introduces a small race condition.
It does bio_reset after raid5_release_stripe, which could make the
stripe reusable and hence reuse the bio just before bio_reset. Moving
bio_reset before raid5_release_stripe is called should fix the race.

Reported-and-tested-by: Stefan Priebe - Profihost AG
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-10 02:09:19 +0800

07 Sep, 2016

1 commit

29c6d1bbd md/raid5: Convert to hotplug state machine ... Browse Code »

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior
Cc: Peter Zijlstra
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160818125731.27256-10-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner

Sebastian Andrzej Siewior
2016-09-07 00:30:23 +0800

01 Sep, 2016

1 commit

ad5b0f768 raid5: guarantee enough stripes to avoid reshape hang ... Browse Code »

If there aren't enough stripes, reshape will hang. We have a check for
this in new reshape, but miss it for reshape resume, hence we could see
hang in reshape resume. This patch forces enough stripes existed if
reshape resumes.

Reviewed-by: NeilBrown
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-01 00:05:23 +0800

31 Aug, 2016

1 commit

86a167986 Merge tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md ... Browse Code »

Pull MD fixes from Shaohua Li:
"This includes several bug fixes:

- Alexey Obitotskiy fixed a hang for faulty raid5 array with external
management

- Song Liu fixed two raid5 journal related bugs

- Tomasz Majchrzak fixed a bad block recording issue and an
accounting issue for raid10

- ZhengYuan Liu fixed an accounting issue for raid5

- I fixed a potential race condition and memory leak with DIF/DIX
enabled

- other trival fixes"

* tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid5: avoid unnecessary bio data set
raid5: fix memory leak of bio integrity data
raid10: record correct address of bad block
md-cluster: fix error return code in join()
r5cache: set MD_JOURNAL_CLEAN correctly
md: don't print the same repeated messages about delayed sync operation
md: remove obsolete ret in md_start_sync
md: do not count journal as spare in GET_ARRAY_INFO
md: Prevent IO hold during accessing to faulty raid5 array
MD: hold mddev lock to change bitmap location
raid5: fix incorrectly counter of conf->empty_inactive_list_nr
raid10: increment write counter after bio is split

Linus Torvalds
2016-08-31 02:24:04 +0800

25 Aug, 2016

3 commits

45c91d808 raid5: avoid unnecessary bio data set ... Browse Code »

bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to
set them every time. bi_private will be set before the bio is
dispatched.

Signed-off-by: Shaohua Li

Shaohua Li
2016-08-25 01:21:53 +0800
5f9d1fde7 raid5: fix memory leak of bio integrity data ... Browse Code »

Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5
doesn't alloc/free bio, instead it reuses bios. There are two issues in
current code:
1. the code calls bio_init (from
init_stripe->raid5_build_block->bio_init) then bio_reset (ops_run_io).
The bio is reused, so likely there is integrity data attached. bio_init
will clear a pointer to integrity data and makes bio_reset can't release
the data
2. bio_reset is called before dispatching bio. After bio is finished,
it's possible we don't free bio's integrity data (eg, we don't call
bio_reset again)
Both issues will cause memory leak. The patch moves bio_init to stripe
creation and bio_reset to bio end io. This will fix the two issues.

Reported-by: Yi Zhang
Signed-off-by: Shaohua Li

Shaohua Li
2016-08-25 01:21:52 +0800
486b0f7bc r5cache: set MD_JOURNAL_CLEAN correctly ... Browse Code »

Currently, the code sets MD_JOURNAL_CLEAN when the array has
MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array
will be MD_JOURNAL_CLEAN even if the journal device is missing.

With this patch, the MD_JOURNAL_CLEAN is only set when the journal
device presents.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2016-08-25 01:21:50 +0800

08 Aug, 2016

1 commit

1eff9d322 block: rename bio bi_rw to bi_opf ... Browse Code »

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-08 04:41:02 +0800

06 Aug, 2016

1 commit

11367799f md: Prevent IO hold during accessing to faulty raid5 array ... Browse Code »

After array enters in faulty state (e.g. number of failed drives
becomes more then accepted for raid5 level) it sets error flags
(one of this flags is MD_CHANGE_PENDING). For internal metadata
arrays MD_CHANGE_PENDING cleared into md_update_sb, but not for
external metadata arrays. MD_CHANGE_PENDING flag set prevents to
finish all new or non-finished IOs to array and hold them in
pending state. In some cases this can leads to deadlock situation.

For example, we have faulty array (2 of 4 drives failed) and
udev handle array state changes and blkid started (or other
userspace application that used array to read/write) but unable
to finish reads due to IO hold. At the same time we unable to get
exclusive access to array (to stop array in our case) because
another external application still use this array.

Fix makes possible to return IO with errors immediately.
So external application can finish working with array and
give exclusive access to other applications to perform
required management actions with array.

Signed-off-by: Alexey Obitotskiy
Signed-off-by: Shaohua Li

Alexey Obitotskiy
2016-08-06 13:03:10 +0800

02 Aug, 2016

1 commit

ff00d3b4e raid5: fix incorrectly counter of conf->empty_inactive_list_nr ... Browse Code »

The counter conf->empty_inactive_list_nr is only used for determine if the
raid5 is congested which is deal with in function raid5_congested().
It was increased in get_free_stripe() when conf->inactive_list got to be
empty and decreased in release_inactive_stripe_list() when splice
temp_inactive_list to conf->inactive_list. However, this may have a
problem when raid5_get_active_stripe or stripe_add_to_batch_list was called,
because these two functions may call list_del_init(&sh->lru) to delete sh from
"conf->inactive_list + hash" which may cause "conf->inactive_list + hash" to
be empty when atomic_inc_not_zero(&sh->count) got false. So a check should be
done at these two point and increase empty_inactive_list_nr accordingly.
Otherwise the counter may get to be negative number which would influence
async readahead from VFS.

Signed-off-by: ZhengYuan Liu
Signed-off-by: Shaohua Li

ZhengYuan Liu
2016-08-02 11:18:21 +0800

29 Jul, 2016

1 commit

3f35e210e Merge branch 'mymd/for-next' into mymd/for-linus Browse Code »

Shaohua Li
2016-07-29 00:34:14 +0800

21 Jul, 2016

1 commit

70246286e block: get rid of bio_rw and READA ... Browse Code »

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Mike Christie
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:37:01 +0800

14 Jun, 2016

5 commits

d787be409 md: reduce the number of synchronize_rcu() calls when multiple devices fail. ... Browse Code »

Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made
which can delay several milliseconds in some case.
If lots of devices fail at once - as could happen with a large RAID10 where one set
of devices are removed all at once - these delays can add up to be very inconcenient.

As failure is not reversible we can check for that first, setting a
separate flag if it is found, and then all synchronize_rcu() once for
all the flagged devices. Then ->hot_remove_disk() function can skip the
synchronize_rcu() step if the flag is set.

fix build error(Shaohua)
Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2016-06-14 02:54:22 +0800
f5b67ae86 md: be extra careful not to take a reference to a Faulty device. ... Browse Code »

It is important that we never increment rdev->nr_pending on a Faulty
device as ->hot_remove_disk() assumes that once the Faulty flag is visible
no code will take a new reference.

Some places take a new reference after only check In_sync. This should
be safe as the two are changed together. However to make the code more
obviously safe, add checks for 'Faulty' as well.

Note: the actual rule is:
Never increment nr_pending if Faulty is set and Blocked is clear,
never clear Faulty, and never set Blocked without holding a reference
through nr_pending.

fix build error (Shaohua)
Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2016-06-14 02:54:21 +0800
5fd133511 md/raid5: add rcu protection to rdev accesses in raid5_status. ... Browse Code »

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2016-06-14 02:54:20 +0800
3f232d6a9 md/raid5: add rcu protection to rdev accesses in want_replace ... Browse Code »

Being in the middle of resync is no longer protection against failed
rdevs disappearing. So add rcu protection.

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2016-06-14 02:54:19 +0800
e50d39923 md/raid5: add rcu protection to rdev accesses in handle_failed_sync. ... Browse Code »

The rdev could be freed while handle_failed_sync is running, so
rcu protection is needed.

Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2016-06-14 02:54:19 +0800

08 Jun, 2016

3 commits

28a8f0d31 block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH ... Browse Code »

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
6296b9604 block, drivers, fs: shrink bi_rw from long to int ... Browse Code »

We don't need bi_rw to be so large on 64 bit archs, so
reduce it to unsigned int.

Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
796a5cf08 md: use bio op accessors ... Browse Code »

Separate the op from the rq_flag_bits and have md
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800

26 May, 2016

1 commit

412575807 right meaning of PARITY_ENABLE_RMW and PARITY_PREFER_RMW ... Browse Code »

In current handle_stripe_dirtying, the code prefers rmw with
PARITY_ENABLE_RMW; while prefers rcw with PARITY_PREFER_RMW.

This patch reverses this behavior.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2016-05-26 12:26:07 +0800

10 May, 2016

2 commits

85ad1d13e md: set MD_CHANGE_PENDING in a atomic region ... Browse Code »

Some code waits for a metadata update by:

1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
2. setting MD_CHANGE_PENDING and waking the management thread
3. waiting for MD_CHANGE_PENDING to be cleared

If the first two are done without locking, the code in md_update_sb()
which checks if it needs to repeat might test if an update is needed
before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
in the wait returning early.

So make sure all places that set MD_CHANGE_PENDING are atomicial, and
bit_clear_unless (suggested by Neil) is introduced for the purpose.

Cc: Martin Kepplinger
Cc: Andrew Morton
Cc: Denys Vlasenko
Cc: Sasha Levin
Cc:
Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-05-10 00:24:02 +0800
fe67d19a2 md: raid5: add prerequisite to run underneath dm-raid ... Browse Code »

In case md runs underneath the dm-raid target, the mddev does not have
a request queue or gendisk, thus avoid accesses.

This patch adds a missing conditional to the raid5 personality.

Signed-of-by: Heinz Mauelshagen
Signed-off-by: Shaohua Li

Heinz Mauelshagen
2016-05-10 00:24:02 +0800

30 Apr, 2016

1 commit

b8a0b8e94 raid5: delete unnecessary warnning ... Browse Code »

If device has R5_LOCKED set, it's legit device has R5_SkipCopy set and page !=
orig_page. After R5_LOCKED is clear, handle_stripe_clean_event will clear the
SkipCopy flag and set page to orig_page. So the warning is unnecessary.

Reported-by: Joey Liao
Signed-off-by: Shaohua Li

Shaohua Li
2016-04-30 05:18:03 +0800

18 Mar, 2016

1 commit

1d034e68e md/raid5: Cleanup cpu hotplug notifier ... Browse Code »

The raid456_cpu_notify() hotplug callback lacks handling of the
CPU_UP_CANCELED case. That means if CPU_UP_PREPARE fails, the scratch
buffer is leaked.

Add handling for CPU_UP_CANCELED[_FROZEN] hotplug notifier transitions
to free the scratch buffer.

CC: Shaohua Li
CC: linux-raid@vger.kernel.org
Signed-off-by: Anna-Maria Gleixner
Signed-off-by: Shaohua Li

Anna-Maria Gleixner
2016-03-18 05:30:15 +0800

10 Mar, 2016

2 commits

fb3229d5c md/raid5: output stripe state for debug ... Browse Code »

Neil recently fixed an obscure race in break_stripe_batch_list. Debug would be
quite convenient if we know the stripe state. This is what this patch does.

Signed-off-by: Shaohua Li

Shaohua Li
2016-03-10 02:08:38 +0800
550da24f8 md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list ... Browse Code »

break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another. This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero. So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
Reported-by: Martin Svec (and others)
Tested-by: Tom Weber
Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
Cc: stable@vger.kernel.org (v4.1 and later)
Signed-off-by: NeilBrown
Signed-off-by: Shaohua Li

NeilBrown
2016-03-10 01:31:41 +0800

27 Feb, 2016

2 commits

6ab2a4b80 RAID5: revert e9e4c377e2f563 to fix a livelock ... Browse Code »

Revert commit
e9e4c377e2f563(md/raid5: per hash value and exclusive wait_for_stripe)

The problem is raid5_get_active_stripe waits on
conf->wait_for_stripe[hash]. Assume hash is 0. My test release stripes
in this order:
- release all stripes with hash 0
- raid5_get_active_stripe still sleeps since active_stripes >
max_nr_stripes * 3 / 4
- release all stripes with hash other than 0. active_stripes becomes 0
- raid5_get_active_stripe still sleeps, since nobody wakes up
wait_for_stripe[0]
The system live locks. The problem is active_stripes isn't a per-hash
count. Revert the patch makes the live lock go away.

Cc: stable@vger.kernel.org (v4.2+)
Cc: Yuanhan Liu
Cc: NeilBrown
Signed-off-by: Shaohua Li

Shaohua Li
2016-02-27 01:44:56 +0800
27a353c02 RAID5: check_reshape() shouldn't call mddev_suspend ... Browse Code »

check_reshape() is called from raid5d thread. raid5d thread shouldn't
call mddev_suspend(), because mddev_suspend() waits for all IO finish
but IO is handled in raid5d thread, we could easily deadlock here.

This issue is introduced by
738a273 ("md/raid5: fix allocation of 'scribble' array.")

Cc: stable@vger.kernel.org (v4.1+)
Reported-and-tested-by: Artur Paszkiewicz
Reviewed-by: NeilBrown
Signed-off-by: Shaohua Li

Shaohua Li
2016-02-27 01:44:11 +0800

26 Feb, 2016

1 commit

e7597e69d md/raid5: Compare apples to apples (or sectors to sectors) ... Browse Code »

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf8 ("MD: raid5 trim support")
Cc: stable@vger.kernel.org v3.7+
Signed-off-by: Jes Sorensen
Signed-off-by: Shaohua Li

Jes Sorensen
2016-02-26 08:38:53 +0800

21 Jan, 2016

1 commit

849674e4f MD: rename some functions ... Browse Code »

These short function names are hard to search. Rename them to make vim happy.

Signed-off-by: Shaohua Li

Shaohua Li
2016-01-21 05:52:20 +0800

06 Jan, 2016

2 commits

f6b6ec5cf raid5-cache: add journal hot add/remove support ... Browse Code »

Add support for journal disk hot add/remove. Mostly trival checks in md
part. The raid5 part is a little tricky. For hot-remove, we can't wait
pending write as it's called from raid5d. The wait will cause deadlock.
We simplily fail the hot-remove. A hot-remove retry can success
eventually since if journal disk is faulty all pending write will be
failed and finish. For hot-add, since an array supporting journal but
without journal disk will be marked read-only, we are safe to hot add
journal without stopping IO (should be read IO, while journal only
handles write IO).

Signed-off-by: Shaohua Li
Signed-off-by: NeilBrown

Shaohua Li
2016-01-06 08:39:57 +0800
b46020aa3 md/raid5: remove redundant check in stripe_add_to_batch_list() ... Browse Code »

The stripe_add_to_batch_list() function is called only if
stripe_can_batch() returned true, so there is no need for double check.

Signed-off-by: Roman Gushchin
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown

Roman Gushchin
2016-01-06 08:38:22 +0800

05 Nov, 2015

1 commit

ac322de6b Merge tag 'md/4.4' of git://neil.brown.name/md ... Browse Code »

Pull md updates from Neil Brown:
"Two major components to this update.

1) The clustered-raid1 support from SUSE is nearly complete. There
are a few outstanding issues being worked on. Maybe half a dozen
patches will bring this to a usable state.

2) The first stage of journalled-raid5 support from Facebook makes an
appearance. With a journal device configured (typically NVRAM or
SSD), the "RAID5 write hole" should be closed - a crash during
degraded operations cannot result in data corruption.

The next stage will be to use the journal as a write-behind cache
so that latency can be reduced and in some cases throughput
increased by performing more full-stripe writes.

* tag 'md/4.4' of git://neil.brown.name/md: (66 commits)
MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW
MD: set journal disk ->raid_disk
MD: kick out journal disk if it's not fresh
raid5-cache: start raid5 readonly if journal is missing
MD: add new bit to indicate raid array with journal
raid5-cache: IO error handling
raid5: journal disk can't be removed
raid5-cache: add trim support for log
MD: fix info output for journal disk
raid5-cache: use bio chaining
raid5-cache: small log->seq cleanup
raid5-cache: new helper: r5_reserve_log_entry
raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
raid5-cache: take rdev->data_offset into account early on
raid5-cache: refactor bio allocation
raid5-cache: clean up r5l_get_meta
raid5-cache: simplify state machine when caches flushes are not needed
raid5-cache: factor out a helper to run all stripes for an I/O unit
raid5-cache: rename flushed_ios to finished_ios
raid5-cache: free I/O units earlier
...

Linus Torvalds
2015-11-05 13:12:47 +0800