Eric Lee / smarc-fsl-linux-kernel

08 Oct, 2016

1 commit

c23112e03 Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md ... Browse Code »

Pull MD updates from Shaohua Li:
"This update includes:

- new AVX512 instruction based raid6 gen/recovery algorithm

- a couple of md-cluster related bug fixes

- fix a potential deadlock

- set nonrotational bit for raid array with SSD

- set correct max_hw_sectors for raid5/6, which hopefuly can improve
performance a little bit

- other minor fixes"

* tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md: set rotational bit
raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
raid5: handle register_shrinker failure
raid5: fix to detect failure of register_shrinker
md: fix a potential deadlock
md/bitmap: fix wrong cleanup
raid5: allow arbitrary max_hw_sectors
lib/raid6: Add AVX512 optimized xor_syndrome functions
lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
lib/raid6: Add AVX512 optimized recovery functions
lib/raid6: Add AVX512 optimized gen_syndrome functions
md-cluster: make resync lock also could be interruptted
md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
md-cluster: convert the completion to wait queue
md-cluster: protect md_find_rdev_nr_rcu with rcu lock
md-cluster: clean related infos of cluster
md: changes for MD_STILL_CLOSED flag
md-cluster: remove some unnecessary dlm_unlock_sync
md-cluster: use FORCEUNLOCK in lockres_free
md-cluster: call md_kick_rdev_from_array once ack failed

Linus Torvalds
2016-10-08 00:45:43 +0800

04 Oct, 2016

2 commits

597f03f9d Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull CPU hotplug updates from Thomas Gleixner:
"Yet another batch of cpu hotplug core updates and conversions:

- Provide core infrastructure for multi instance drivers so the
drivers do not have to keep custom lists.

- Convert custom lists to the new infrastructure. The block-mq custom
list conversion comes through the block tree and makes the diffstat
tip over to more lines removed than added.

- Handle unbalanced hotplug enable/disable calls more gracefully.

- Remove the obsolete CPU_STARTING/DYING notifier support.

- Convert another batch of notifier users.

The relayfs changes which conflicted with the conversion have been
shipped to me by Andrew.

The remaining lot is targeted for 4.10 so that we finally can remove
the rest of the notifiers"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
cpufreq: Fix up conversion to hotplug state machine
blk/mq: Reserve hotplug states for block multiqueue
x86/apic/uv: Convert to hotplug state machine
s390/mm/pfault: Convert to hotplug state machine
mips/loongson/smp: Convert to hotplug state machine
mips/octeon/smp: Convert to hotplug state machine
fault-injection/cpu: Convert to hotplug state machine
padata: Convert to hotplug state machine
cpufreq: Convert to hotplug state machine
ACPI/processor: Convert to hotplug state machine
virtio scsi: Convert to hotplug state machine
oprofile/timer: Convert to hotplug state machine
block/softirq: Convert to hotplug state machine
lib/irq_poll: Convert to hotplug state machine
x86/microcode: Convert to hotplug state machine
sh/SH-X3 SMP: Convert to hotplug state machine
ia64/mca: Convert to hotplug state machine
ARM/OMAP/wakeupgen: Convert to hotplug state machine
ARM/shmobile: Convert to hotplug state machine
arm64/FP/SIMD: Convert to hotplug state machine
...

Linus Torvalds
2016-10-04 10:43:08 +0800
bb086a89a md: set rotational bit ... Browse Code »

if all disks in an array are non-rotational, set the array
non-rotational.

This only works for array with all disks populated at startup. Support
for disk hotadd/hotremove could be added later if necessary.

Acked-by: Tejun Heo
Signed-off-by: Shaohua Li

Shaohua Li
2016-10-04 01:20:27 +0800

22 Sep, 2016

14 commits

30c894656 raid5: handle register_shrinker failure ... Browse Code »

register_shrinker() now can fail. When it happens, shrinker.nr_deferred is
null. We use it to determine if unregister_shrinker is required.

Signed-off-by: Shaohua Li

Shaohua Li
2016-09-22 00:09:44 +0800
6a0f53ff3 raid5: fix to detect failure of register_shrinker ... Browse Code »

register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after raid5 configuration was setup successfully.

Signed-off-by: Chao Yu
Signed-off-by: Shaohua Li

Chao Yu
2016-09-22 00:09:44 +0800
90bcf1338 md: fix a potential deadlock ... Browse Code »

lockdep reports a potential deadlock. Fix this by droping the mutex
before md_import_device

[ 1137.126601] ======================================================
[ 1137.127013] [ INFO: possible circular locking dependency detected ]
[ 1137.127013] 4.8.0-rc4+ #538 Not tainted
[ 1137.127013] -------------------------------------------------------
[ 1137.127013] mdadm/16675 is trying to acquire lock:
[ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [] __blkdev_get+0x63/0x450
[ 1137.127013]
but task is already holding lock:
[ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [] md_ioctl+0x2ac/0x1f50
[ 1137.127013]
which lock already depends on the new lock.

[ 1137.127013]
the existing dependency chain (in reverse order) is:
[ 1137.127013]
-> #1 (detected_devices_mutex){+.+.+.}:
[ 1137.127013] [] lock_acquire+0xb9/0x220
[ 1137.127013] [] mutex_lock_nested+0x67/0x3d0
[ 1137.127013] [] md_autodetect_dev+0x3f/0x90
[ 1137.127013] [] rescan_partitions+0x1a8/0x2c0
[ 1137.127013] [] __blkdev_reread_part+0x71/0xb0
[ 1137.127013] [] blkdev_reread_part+0x25/0x40
[ 1137.127013] [] blkdev_ioctl+0x51b/0xa30
[ 1137.127013] [] block_ioctl+0x41/0x50
[ 1137.127013] [] do_vfs_ioctl+0x96/0x6e0
[ 1137.127013] [] SyS_ioctl+0x41/0x70
[ 1137.127013] [] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 1137.127013]
-> #0 (&bdev->bd_mutex){+.+.+.}:
[ 1137.127013] [] __lock_acquire+0x1662/0x1690
[ 1137.127013] [] lock_acquire+0xb9/0x220
[ 1137.127013] [] mutex_lock_nested+0x67/0x3d0
[ 1137.127013] [] __blkdev_get+0x63/0x450
[ 1137.127013] [] blkdev_get+0x227/0x350
[ 1137.127013] [] blkdev_get_by_dev+0x36/0x50
[ 1137.127013] [] lock_rdev+0x35/0x80
[ 1137.127013] [] md_import_device+0xb4/0x1b0
[ 1137.127013] [] md_ioctl+0x2f6/0x1f50
[ 1137.127013] [] blkdev_ioctl+0x283/0xa30
[ 1137.127013] [] block_ioctl+0x41/0x50
[ 1137.127013] [] do_vfs_ioctl+0x96/0x6e0
[ 1137.127013] [] SyS_ioctl+0x41/0x70
[ 1137.127013] [] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 1137.127013]
other info that might help us debug this:

[ 1137.127013] Possible unsafe locking scenario:

[ 1137.127013] CPU0 CPU1
[ 1137.127013] ---- ----
[ 1137.127013] lock(detected_devices_mutex);
[ 1137.127013] lock(&bdev->bd_mutex);
[ 1137.127013] lock(detected_devices_mutex);
[ 1137.127013] lock(&bdev->bd_mutex);
[ 1137.127013]
*** DEADLOCK ***

Cc: Cong Wang
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-22 00:09:44 +0800
f71f1cf97 md/bitmap: fix wrong cleanup ... Browse Code »

if bitmap_create fails, the bitmap is already cleaned up and the returned value
is an error number. We can't do the cleanup again.

Reported-by: Christophe JAILLET
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-22 00:09:44 +0800
1dffddddd raid5: allow arbitrary max_hw_sectors ... Browse Code »

raid5 will split bio to proper size internally, there is no point to use
underlayer disk's max_hw_sectors. In my qemu system, without the change,
the raid5 only receives 128k size bio, which reduces the chance of bio
merge sending to underlayer disks.

Signed-off-by: Shaohua Li

Shaohua Li
2016-09-22 00:09:44 +0800
d6385db94 md-cluster: make resync lock also could be interruptted ... Browse Code »

When one node is perform resync or recovery, other nodes
can't get resync lock and could block for a while before
it holds the lock, so we can't stop array immediately for
this scenario.

To make array could be stop quickly, we check MD_CLOSING
in dlm_lock_sync_interruptible to make us can interrupt
the lock request.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
7bcda7149 md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang ... Browse Code »

When some node leaves cluster, then it's bitmap need to be
synced by another node, so "md*_recover" thread is triggered
for the purpose. However, with below steps. we can find tasks
hang happened either in B or C.

1. Node A create a resyncing cluster raid1, assemble it in
other two nodes (B and C).
2. stop array in B and C.
3. stop array in A.

linux44:~ # ps aux|grep md|grep D
root 5938 0.0 0.1 19852 1964 pts/0 D+ 14:52 0:00 mdadm -S md0
root 5939 0.0 0.0 0 0 ? D 14:52 0:00 [md0_recover]

linux44:~ # cat /proc/5939/stack
[] dlm_lock_sync+0x71/0x90 [md_cluster]
[] recover_bitmaps+0x125/0x220 [md_cluster]
[] md_thread+0x16d/0x180 [md_mod]
[] kthread+0xb4/0xc0
[] ret_from_fork+0x58/0x90

linux44:~ # cat /proc/5938/stack
[] kthread_stop+0x6e/0x120
[] md_unregister_thread+0x40/0x80 [md_mod]
[] leave+0x70/0x120 [md_cluster]
[] md_cluster_stop+0x14/0x30 [md_mod]
[] bitmap_free+0x14b/0x150 [md_mod]
[] do_md_stop+0x35b/0x5a0 [md_mod]
[] md_ioctl+0x873/0x1590 [md_mod]
[] blkdev_ioctl+0x214/0x7d0
[] block_ioctl+0x3d/0x40
[] do_vfs_ioctl+0x2d4/0x4b0
[] SyS_ioctl+0x88/0xa0
[] system_call_fastpath+0x16/0x1b

The problem is caused by recover_bitmaps can't reliably abort
when the thread is unregistered. So dlm_lock_sync_interruptible
is introduced to detect the thread's situation to fix the problem.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
fccb60a42 md-cluster: convert the completion to wait queue ... Browse Code »

Previously, we used completion to sync between require dlm lock
and sync_ast, however we will have to expose completion.wait
and completion.done in dlm_lock_sync_interruptible (introduced
later), it is not a common usage for completion, so convert
related things to wait queue.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
5f0aa21da md-cluster: protect md_find_rdev_nr_rcu with rcu lock ... Browse Code »

We need to use rcu_read_lock/unlock to avoid potential
race.

Reported-by: Shaohua Li
Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
c20c33f0e md-cluster: clean related infos of cluster ... Browse Code »

cluster_info and bitmap_info.nodes also need to be
cleared when array is stopped.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
af8d8e6f0 md: changes for MD_STILL_CLOSED flag ... Browse Code »

When stop clustered raid while it is pending on resync,
MD_STILL_CLOSED flag could be cleared since udev rule
is triggered to open the mddev. So obviously array can't
be stopped soon and returns EBUSY.

mdadm -Ss md-raid-arrays.rules
set MD_STILL_CLOSED md_open()
... ... ... clear MD_STILL_CLOSED
do_md_stop

We make below changes to resolve this issue:

1. rename MD_STILL_CLOSED to MD_CLOSING since it is set
when stop array and it means we are stopping array.
2. let md_open returns early if CLOSING is set, so no
other threads will open array if one thread is trying
to close it.
3. no need to clear CLOSING bit in md_open because 1 has
ensure the bit is cleared, then we also don't need to
test CLOSING bit in do_md_stop.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
e3f924d3d md-cluster: remove some unnecessary dlm_unlock_sync ... Browse Code »

Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
we don't need to call dlm_unlock_sync before free
lock resource.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
400cb454a md-cluster: use FORCEUNLOCK in lockres_free ... Browse Code »

For dlm_unlock, we need to pass flag to dlm_unlock as the
third parameter instead of set res->flags.

Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
since it works even the lock is on waiting or convert queue.

Acked-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800
e566aef12 md-cluster: call md_kick_rdev_from_array once ack failed ... Browse Code »

The new_disk_ack could return failure if WAITING_FOR_NEWDISK
is not set, so we need to kick the dev from array in case
failure happened.

And we missed to check err before call new_disk_ack othwise
we could kick a rdev which isn't in array, thanks for the
reminder from Shaohua.

Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-22 00:09:44 +0800

14 Sep, 2016

1 commit

106f2e59e Merge tag 'md/4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md ... Browse Code »

Pull MD fixes from Shaohua Li:
"A few bug fixes for MD:

- Guoqing fixed a bug compiling md-cluster in kernel

- I fixed a potential deadlock in raid5-cache superblock write, a
hang in raid5 reshape resume and a race condition introduced in
rc4"

* tag 'md/4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid5: fix a small race condition
md-cluster: make md-cluster also can work when compiled into kernel
raid5: guarantee enough stripes to avoid reshape hang
raid5-cache: fix a deadlock in superblock write

Linus Torvalds
2016-09-14 02:19:52 +0800

10 Sep, 2016

1 commit

c94455558 raid5: fix a small race condition ... Browse Code »

commit 5f9d1fde7d54a5(raid5: fix memory leak of bio integrity data)
moves bio_reset to bio_endio. But it introduces a small race condition.
It does bio_reset after raid5_release_stripe, which could make the
stripe reusable and hence reuse the bio just before bio_reset. Moving
bio_reset before raid5_release_stripe is called should fix the race.

Reported-and-tested-by: Stefan Priebe - Profihost AG
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-10 02:09:19 +0800

09 Sep, 2016

1 commit

47a7b0d88 md-cluster: make md-cluster also can work when compiled into kernel ... Browse Code »

The md-cluster is compiled as module by default,
if it is compiled by built-in way, then we can't
make md-cluster works.

[64782.630008] md/raid1:md127: active with 2 out of 2 mirrors
[64782.630528] md-cluster module not found.
[64782.630530] md127: Could not setup cluster service (-2)

Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions")
Cc: stable@vger.kernel.org (v4.1+)
Reported-by: Marc Smith
Reviewed-by: NeilBrown
Signed-off-by: Guoqing Jiang
Signed-off-by: Shaohua Li

Guoqing Jiang
2016-09-09 02:11:27 +0800

07 Sep, 2016

1 commit

29c6d1bbd md/raid5: Convert to hotplug state machine ... Browse Code »

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior
Cc: Peter Zijlstra
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160818125731.27256-10-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner

Sebastian Andrzej Siewior
2016-09-07 00:30:23 +0800

04 Sep, 2016

1 commit

28e68154c Merge tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm ... Browse Code »

Pull device mapper fixes from Mike Snitzer:

- a stable fix in both DM crypt and DM log-writes for too large bios
(as generated by bcache)

- two other stable fixes for DM log-writes

- a stable fix for a DM crypt bug that could result in freeing pointers
from uninitialized memory in the tfm allocation error path

- a DM bufio cleanup to discontinue using create_singlethread_workqueue()

* tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm bufio: remove use of deprecated create_singlethread_workqueue()
dm crypt: fix free of bad values after tfm allocation failure
dm crypt: fix error with too large bios
dm log writes: fix check of kthread_run() return value
dm log writes: fix bug with too large bios
dm log writes: move IO accounting earlier to fix error path

Linus Torvalds
2016-09-04 08:29:58 +0800

01 Sep, 2016

2 commits

ad5b0f768 raid5: guarantee enough stripes to avoid reshape hang ... Browse Code »

If there aren't enough stripes, reshape will hang. We have a check for
this in new reshape, but miss it for reshape resume, hence we could see
hang in reshape resume. This patch forces enough stripes existed if
reshape resumes.

Reviewed-by: NeilBrown
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-01 00:05:23 +0800
8e018c21d raid5-cache: fix a deadlock in superblock write ... Browse Code »

There is a potential deadlock in superblock write. Discard could zero data, so
before discard we must make sure superblock is updated to new log tail.
Updating superblock (either directly call md_update_sb() or depend on md
thread) must hold reconfig mutex. On the other hand, raid5_quiesce is called
with reconfig_mutex hold. The first step of raid5_quiesce() is waitting for all
IO finish, hence waitting for reclaim thread, while reclaim thread is calling
this function and waitting for reconfig mutex. So there is a deadlock. We
workaround this issue with a trylock. The downside of the solution is we could
miss discard if we can't take reconfig mutex. But this should happen rarely
(mainly in raid array stop), so miss discard shouldn't be a big problem.

Cc: NeilBrown
Signed-off-by: Shaohua Li

Shaohua Li
2016-09-01 00:05:18 +0800

31 Aug, 2016

7 commits

edd1ea2a8 dm bufio: remove use of deprecated create_singlethread_workqueue() ... Browse Code »

The workqueue "dm_bufio_wq" queues a single work item &dm_bufio_work so
it doesn't require execution ordering. Hence, alloc_workqueue() has
been used to replace the deprecated create_singlethread_workqueue().

The WQ_MEM_RECLAIM flag has been set since DM requires forward progress
under memory pressure.

Since there are fixed number of work items, explicit concurrency
limit is unnecessary here.

Signed-off-by: Bhaktipriya Shridhar
Signed-off-by: Mike Snitzer

Bhaktipriya Shridhar
2016-08-31 07:45:20 +0800
5d0be84ec dm crypt: fix free of bad values after tfm allocation failure ... Browse Code »

If crypt_alloc_tfms() had to allocate multiple tfms and it failed before
the last allocation, then it would call crypt_free_tfms() and could free
pointers from uninitialized memory -- due to the crypt_free_tfms() check
for non-zero cc->tfms[i]. Fix by allocating zeroed memory.

Signed-off-by: Eric Biggers
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Eric Biggers
2016-08-31 07:45:19 +0800
4e870e948 dm crypt: fix error with too large bios ... Browse Code »

When dm-crypt processes writes, it allocates a new bio in
crypt_alloc_buffer(). The bio is allocated from a bio set and it can
have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
larger (e.g. if it was allocated by bcache). If the incoming bio is
larger, bio_alloc_bioset() fails and an error is returned.

To avoid the error, we test for a too large bio in the function
crypt_map() and use dm_accept_partial_bio() to split the bio.
dm_accept_partial_bio() trims the current bio to the desired size and
asks DM core to send another bio with the rest of the data.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # v3.16+

Mikulas Patocka
2016-08-31 07:44:11 +0800
91e630d9a dm log writes: fix check of kthread_run() return value ... Browse Code »

The kthread_run() function returns either a valid task_struct or
ERR_PTR() value, check for NULL is invalid. This change fixes potential
for oops, e.g. in OOM situation.

Signed-off-by: Vladimir Zapolskiy
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Vladimir Zapolskiy
2016-08-31 07:41:43 +0800
7efb36732 dm log writes: fix bug with too large bios ... Browse Code »

bio_alloc() can allocate a bio with at most BIO_MAX_PAGES (256) vector
entries. However, the incoming bio may have more vector entries if it
was allocated by other means. For example, bcache submits bios with
more than BIO_MAX_PAGES entries. This results in bio_alloc() failure.

To avoid the failure, change the code so that it allocates bio with at
most BIO_MAX_PAGES entries. If the incoming bio has more entries,
bio_add_page() will fail and a new bio will be allocated - the code that
handles bio_add_page() failure already exists in the dm-log-writes
target.

Signed-off-by: Mikulas Patocka
Reviewed-by: Josef Bacik
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # v4.1+

Mikulas Patocka
2016-08-31 04:20:55 +0800
a5d60783d dm log writes: move IO accounting earlier to fix error path ... Browse Code »

Move log_one_block()'s atomic_inc(&lc->io_blocks) before bio_alloc() to
fix a bug that the target hangs if bio_alloc() fails. The error path
does put_io_block(lc), so atomic_inc(&lc->io_blocks) must occur before
invoking the error path to avoid underflow of lc->io_blocks.

Signed-off-by: Mikulas Patocka
Reviewed-by: Josef Bacik
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Mikulas Patocka
2016-08-31 04:16:49 +0800
86a167986 Merge tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md ... Browse Code »

Pull MD fixes from Shaohua Li:
"This includes several bug fixes:

- Alexey Obitotskiy fixed a hang for faulty raid5 array with external
management

- Song Liu fixed two raid5 journal related bugs

- Tomasz Majchrzak fixed a bad block recording issue and an
accounting issue for raid10

- ZhengYuan Liu fixed an accounting issue for raid5

- I fixed a potential race condition and memory leak with DIF/DIX
enabled

- other trival fixes"

* tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid5: avoid unnecessary bio data set
raid5: fix memory leak of bio integrity data
raid10: record correct address of bad block
md-cluster: fix error return code in join()
r5cache: set MD_JOURNAL_CLEAN correctly
md: don't print the same repeated messages about delayed sync operation
md: remove obsolete ret in md_start_sync
md: do not count journal as spare in GET_ARRAY_INFO
md: Prevent IO hold during accessing to faulty raid5 array
MD: hold mddev lock to change bitmap location
raid5: fix incorrectly counter of conf->empty_inactive_list_nr
raid10: increment write counter after bio is split

Linus Torvalds
2016-08-31 02:24:04 +0800

27 Aug, 2016

2 commits

6ec675ede Merge tag 'dm-4.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm ... Browse Code »

Pull device mapper fixes from Mike Snitzer:

- another stable fix for DM flakey (that tweaks the previous fix that
didn't factor in expected 'drop_writes' behavior for read IO).

- a dm-log bio operation flags fix for the broader block changes that
were merged during the 4.8 merge window.

* tag 'dm-4.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm log: fix unitialized bio operation flags
dm flakey: fix reads to be issued if drop_writes configured

Linus Torvalds
2016-08-27 11:15:32 +0800
fd1ae5145 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Here's a set of block fixes for the current 4.8-rc release. This
contains:

- a fix for a secure erase regression, from Adrian.

- a fix for an mmc use-after-free bug regression, also from Adrian.

- potential zero pointer deference in bdev freezing, from Andrey.

- a race fix for blk_set_queue_dying() from Bart.

- a set of xen blkfront fixes from Bob Liu.

- three small fixes for bcache, from Eric and Kent.

- a fix for a potential invalid NVMe state transition, from Gabriel.

- blk-mq CPU offline fix, preventing us from issuing and completing a
request on the wrong queue. From me.

- revert two previous floppy changes, since they caused a user
visibile regression. A better fix is in the works.

- ensure that we don't send down bios that have more than 256
elements in them. Fixes a crash with bcache, for example. From
Ming.

- a fix for deferencing an error pointer with cgroup writeback.
Fixes a regression. From Vegard"

* 'for-linus' of git://git.kernel.dk/linux-block:
mmc: fix use-after-free of struct request
Revert "floppy: refactor open() flags handling"
Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
fs/block_dev: fix potential NULL ptr deref in freeze_bdev()
blk-mq: improve warning for running a queue on the wrong CPU
blk-mq: don't overwrite rq->mq_ctx
block: make sure a big bio is split into at most 256 bvecs
nvme: Fix nvme_get/set_features() with a NULL result pointer
bdev: fix NULL pointer dereference
xen-blkfront: free resources if xlvbd_alloc_gendisk fails
xen-blkfront: introduce blkif_set_queue_limits()
xen-blkfront: fix places not updated after introducing 64KB page granularity
bcache: pr_err: more meaningful error message when nr_stripes is invalid
bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
bcache: register_bcache(): call blkdev_put() when cache_alloc() fails
block: Fix race triggered by blk_set_queue_dying()
block: Fix secure erase
nvme: Prevent controller state invalid transition

Linus Torvalds
2016-08-27 09:50:07 +0800

25 Aug, 2016

7 commits

9c5a559d9 dm log: fix unitialized bio operation flags ... Browse Code »

Commit e6047149db ("dm: use bio op accessors") switched DM over to
using bio_set_op_attrs() but didn't take care to initialize
lc->io_req.bi_op_flags in dm-log.c:rw_header(). This caused
rw_header()'s call to dm_io() to make bio->bi_op_flags be uninitialized
in dm-io.c:do_region(), which ultimately resulted in a SCSI BUG() in
sd_init_command().

Also, adjust rw_header() and its callers to use REQ_OP_{READ|WRITE}.

Fixes: e6047149db ("dm: use bio op accessors")
Signed-off-by: Heinz Mauelshagen
Reviewed-by: Shaun Tancheff
Signed-off-by: Mike Snitzer

Heinz Mauelshagen
2016-08-25 09:55:05 +0800
299f6230b dm flakey: fix reads to be issued if drop_writes configured ... Browse Code »

v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.

Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Reported-by: Qu Wenruo
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Mike Snitzer
2016-08-25 09:55:05 +0800
45c91d808 raid5: avoid unnecessary bio data set ... Browse Code »

bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to
set them every time. bi_private will be set before the bio is
dispatched.

Signed-off-by: Shaohua Li

Shaohua Li
2016-08-25 01:21:53 +0800
5f9d1fde7 raid5: fix memory leak of bio integrity data ... Browse Code »

Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5
doesn't alloc/free bio, instead it reuses bios. There are two issues in
current code:
1. the code calls bio_init (from
init_stripe->raid5_build_block->bio_init) then bio_reset (ops_run_io).
The bio is reused, so likely there is integrity data attached. bio_init
will clear a pointer to integrity data and makes bio_reset can't release
the data
2. bio_reset is called before dispatching bio. After bio is finished,
it's possible we don't free bio's integrity data (eg, we don't call
bio_reset again)
Both issues will cause memory leak. The patch moves bio_init to stripe
creation and bio_reset to bio end io. This will fix the two issues.

Reported-by: Yi Zhang
Signed-off-by: Shaohua Li

Shaohua Li
2016-08-25 01:21:52 +0800
27028626b raid10: record correct address of bad block ... Browse Code »

For failed write request record block address on a device, not block
address in an array.

Signed-off-by: Tomasz Majchrzak
Signed-off-by: Shaohua Li

Tomasz Majchrzak
2016-08-25 01:21:51 +0800
0f6187dbe md-cluster: fix error return code in join() ... Browse Code »

Fix to return error code -ENOMEM from the lockres_init() error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun
Signed-off-by: Shaohua Li

Wei Yongjun
2016-08-25 01:21:51 +0800
486b0f7bc r5cache: set MD_JOURNAL_CLEAN correctly ... Browse Code »

Currently, the code sets MD_JOURNAL_CLEAN when the array has
MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array
will be MD_JOURNAL_CLEAN even if the journal device is missing.

With this patch, the MD_JOURNAL_CLEAN is only set when the journal
device presents.

Signed-off-by: Song Liu
Signed-off-by: Shaohua Li

Song Liu
2016-08-25 01:21:50 +0800