08 Oct, 2016

1 commit

  • Pull MD updates from Shaohua Li:
    "This update includes:

    - new AVX512 instruction based raid6 gen/recovery algorithm

    - a couple of md-cluster related bug fixes

    - fix a potential deadlock

    - set nonrotational bit for raid array with SSD

    - set correct max_hw_sectors for raid5/6, which hopefuly can improve
    performance a little bit

    - other minor fixes"

    * tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md: set rotational bit
    raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
    raid5: handle register_shrinker failure
    raid5: fix to detect failure of register_shrinker
    md: fix a potential deadlock
    md/bitmap: fix wrong cleanup
    raid5: allow arbitrary max_hw_sectors
    lib/raid6: Add AVX512 optimized xor_syndrome functions
    lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
    lib/raid6: Add AVX512 optimized recovery functions
    lib/raid6: Add AVX512 optimized gen_syndrome functions
    md-cluster: make resync lock also could be interruptted
    md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
    md-cluster: convert the completion to wait queue
    md-cluster: protect md_find_rdev_nr_rcu with rcu lock
    md-cluster: clean related infos of cluster
    md: changes for MD_STILL_CLOSED flag
    md-cluster: remove some unnecessary dlm_unlock_sync
    md-cluster: use FORCEUNLOCK in lockres_free
    md-cluster: call md_kick_rdev_from_array once ack failed

    Linus Torvalds
     

04 Oct, 2016

2 commits

  • Pull CPU hotplug updates from Thomas Gleixner:
    "Yet another batch of cpu hotplug core updates and conversions:

    - Provide core infrastructure for multi instance drivers so the
    drivers do not have to keep custom lists.

    - Convert custom lists to the new infrastructure. The block-mq custom
    list conversion comes through the block tree and makes the diffstat
    tip over to more lines removed than added.

    - Handle unbalanced hotplug enable/disable calls more gracefully.

    - Remove the obsolete CPU_STARTING/DYING notifier support.

    - Convert another batch of notifier users.

    The relayfs changes which conflicted with the conversion have been
    shipped to me by Andrew.

    The remaining lot is targeted for 4.10 so that we finally can remove
    the rest of the notifiers"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    cpufreq: Fix up conversion to hotplug state machine
    blk/mq: Reserve hotplug states for block multiqueue
    x86/apic/uv: Convert to hotplug state machine
    s390/mm/pfault: Convert to hotplug state machine
    mips/loongson/smp: Convert to hotplug state machine
    mips/octeon/smp: Convert to hotplug state machine
    fault-injection/cpu: Convert to hotplug state machine
    padata: Convert to hotplug state machine
    cpufreq: Convert to hotplug state machine
    ACPI/processor: Convert to hotplug state machine
    virtio scsi: Convert to hotplug state machine
    oprofile/timer: Convert to hotplug state machine
    block/softirq: Convert to hotplug state machine
    lib/irq_poll: Convert to hotplug state machine
    x86/microcode: Convert to hotplug state machine
    sh/SH-X3 SMP: Convert to hotplug state machine
    ia64/mca: Convert to hotplug state machine
    ARM/OMAP/wakeupgen: Convert to hotplug state machine
    ARM/shmobile: Convert to hotplug state machine
    arm64/FP/SIMD: Convert to hotplug state machine
    ...

    Linus Torvalds
     
  • if all disks in an array are non-rotational, set the array
    non-rotational.

    This only works for array with all disks populated at startup. Support
    for disk hotadd/hotremove could be added later if necessary.

    Acked-by: Tejun Heo
    Signed-off-by: Shaohua Li

    Shaohua Li
     

22 Sep, 2016

14 commits

  • register_shrinker() now can fail. When it happens, shrinker.nr_deferred is
    null. We use it to determine if unregister_shrinker is required.

    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
    deferred work"), we should detect the failure of it, otherwise we may
    fail to register shrinker after raid5 configuration was setup successfully.

    Signed-off-by: Chao Yu
    Signed-off-by: Shaohua Li

    Chao Yu
     
  • lockdep reports a potential deadlock. Fix this by droping the mutex
    before md_import_device

    [ 1137.126601] ======================================================
    [ 1137.127013] [ INFO: possible circular locking dependency detected ]
    [ 1137.127013] 4.8.0-rc4+ #538 Not tainted
    [ 1137.127013] -------------------------------------------------------
    [ 1137.127013] mdadm/16675 is trying to acquire lock:
    [ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [] __blkdev_get+0x63/0x450
    [ 1137.127013]
    but task is already holding lock:
    [ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [] md_ioctl+0x2ac/0x1f50
    [ 1137.127013]
    which lock already depends on the new lock.

    [ 1137.127013]
    the existing dependency chain (in reverse order) is:
    [ 1137.127013]
    -> #1 (detected_devices_mutex){+.+.+.}:
    [ 1137.127013] [] lock_acquire+0xb9/0x220
    [ 1137.127013] [] mutex_lock_nested+0x67/0x3d0
    [ 1137.127013] [] md_autodetect_dev+0x3f/0x90
    [ 1137.127013] [] rescan_partitions+0x1a8/0x2c0
    [ 1137.127013] [] __blkdev_reread_part+0x71/0xb0
    [ 1137.127013] [] blkdev_reread_part+0x25/0x40
    [ 1137.127013] [] blkdev_ioctl+0x51b/0xa30
    [ 1137.127013] [] block_ioctl+0x41/0x50
    [ 1137.127013] [] do_vfs_ioctl+0x96/0x6e0
    [ 1137.127013] [] SyS_ioctl+0x41/0x70
    [ 1137.127013] [] entry_SYSCALL_64_fastpath+0x18/0xa8
    [ 1137.127013]
    -> #0 (&bdev->bd_mutex){+.+.+.}:
    [ 1137.127013] [] __lock_acquire+0x1662/0x1690
    [ 1137.127013] [] lock_acquire+0xb9/0x220
    [ 1137.127013] [] mutex_lock_nested+0x67/0x3d0
    [ 1137.127013] [] __blkdev_get+0x63/0x450
    [ 1137.127013] [] blkdev_get+0x227/0x350
    [ 1137.127013] [] blkdev_get_by_dev+0x36/0x50
    [ 1137.127013] [] lock_rdev+0x35/0x80
    [ 1137.127013] [] md_import_device+0xb4/0x1b0
    [ 1137.127013] [] md_ioctl+0x2f6/0x1f50
    [ 1137.127013] [] blkdev_ioctl+0x283/0xa30
    [ 1137.127013] [] block_ioctl+0x41/0x50
    [ 1137.127013] [] do_vfs_ioctl+0x96/0x6e0
    [ 1137.127013] [] SyS_ioctl+0x41/0x70
    [ 1137.127013] [] entry_SYSCALL_64_fastpath+0x18/0xa8
    [ 1137.127013]
    other info that might help us debug this:

    [ 1137.127013] Possible unsafe locking scenario:

    [ 1137.127013] CPU0 CPU1
    [ 1137.127013] ---- ----
    [ 1137.127013] lock(detected_devices_mutex);
    [ 1137.127013] lock(&bdev->bd_mutex);
    [ 1137.127013] lock(detected_devices_mutex);
    [ 1137.127013] lock(&bdev->bd_mutex);
    [ 1137.127013]
    *** DEADLOCK ***

    Cc: Cong Wang
    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • if bitmap_create fails, the bitmap is already cleaned up and the returned value
    is an error number. We can't do the cleanup again.

    Reported-by: Christophe JAILLET
    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • raid5 will split bio to proper size internally, there is no point to use
    underlayer disk's max_hw_sectors. In my qemu system, without the change,
    the raid5 only receives 128k size bio, which reduces the chance of bio
    merge sending to underlayer disks.

    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • When one node is perform resync or recovery, other nodes
    can't get resync lock and could block for a while before
    it holds the lock, so we can't stop array immediately for
    this scenario.

    To make array could be stop quickly, we check MD_CLOSING
    in dlm_lock_sync_interruptible to make us can interrupt
    the lock request.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • When some node leaves cluster, then it's bitmap need to be
    synced by another node, so "md*_recover" thread is triggered
    for the purpose. However, with below steps. we can find tasks
    hang happened either in B or C.

    1. Node A create a resyncing cluster raid1, assemble it in
    other two nodes (B and C).
    2. stop array in B and C.
    3. stop array in A.

    linux44:~ # ps aux|grep md|grep D
    root 5938 0.0 0.1 19852 1964 pts/0 D+ 14:52 0:00 mdadm -S md0
    root 5939 0.0 0.0 0 0 ? D 14:52 0:00 [md0_recover]

    linux44:~ # cat /proc/5939/stack
    [] dlm_lock_sync+0x71/0x90 [md_cluster]
    [] recover_bitmaps+0x125/0x220 [md_cluster]
    [] md_thread+0x16d/0x180 [md_mod]
    [] kthread+0xb4/0xc0
    [] ret_from_fork+0x58/0x90

    linux44:~ # cat /proc/5938/stack
    [] kthread_stop+0x6e/0x120
    [] md_unregister_thread+0x40/0x80 [md_mod]
    [] leave+0x70/0x120 [md_cluster]
    [] md_cluster_stop+0x14/0x30 [md_mod]
    [] bitmap_free+0x14b/0x150 [md_mod]
    [] do_md_stop+0x35b/0x5a0 [md_mod]
    [] md_ioctl+0x873/0x1590 [md_mod]
    [] blkdev_ioctl+0x214/0x7d0
    [] block_ioctl+0x3d/0x40
    [] do_vfs_ioctl+0x2d4/0x4b0
    [] SyS_ioctl+0x88/0xa0
    [] system_call_fastpath+0x16/0x1b

    The problem is caused by recover_bitmaps can't reliably abort
    when the thread is unregistered. So dlm_lock_sync_interruptible
    is introduced to detect the thread's situation to fix the problem.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • Previously, we used completion to sync between require dlm lock
    and sync_ast, however we will have to expose completion.wait
    and completion.done in dlm_lock_sync_interruptible (introduced
    later), it is not a common usage for completion, so convert
    related things to wait queue.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • We need to use rcu_read_lock/unlock to avoid potential
    race.

    Reported-by: Shaohua Li
    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • cluster_info and bitmap_info.nodes also need to be
    cleared when array is stopped.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • When stop clustered raid while it is pending on resync,
    MD_STILL_CLOSED flag could be cleared since udev rule
    is triggered to open the mddev. So obviously array can't
    be stopped soon and returns EBUSY.

    mdadm -Ss md-raid-arrays.rules
    set MD_STILL_CLOSED md_open()
    ... ... ... clear MD_STILL_CLOSED
    do_md_stop

    We make below changes to resolve this issue:

    1. rename MD_STILL_CLOSED to MD_CLOSING since it is set
    when stop array and it means we are stopping array.
    2. let md_open returns early if CLOSING is set, so no
    other threads will open array if one thread is trying
    to close it.
    3. no need to clear CLOSING bit in md_open because 1 has
    ensure the bit is cleared, then we also don't need to
    test CLOSING bit in do_md_stop.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
    we don't need to call dlm_unlock_sync before free
    lock resource.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • For dlm_unlock, we need to pass flag to dlm_unlock as the
    third parameter instead of set res->flags.

    Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
    since it works even the lock is on waiting or convert queue.

    Acked-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • The new_disk_ack could return failure if WAITING_FOR_NEWDISK
    is not set, so we need to kick the dev from array in case
    failure happened.

    And we missed to check err before call new_disk_ack othwise
    we could kick a rdev which isn't in array, thanks for the
    reminder from Shaohua.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

14 Sep, 2016

1 commit

  • Pull MD fixes from Shaohua Li:
    "A few bug fixes for MD:

    - Guoqing fixed a bug compiling md-cluster in kernel

    - I fixed a potential deadlock in raid5-cache superblock write, a
    hang in raid5 reshape resume and a race condition introduced in
    rc4"

    * tag 'md/4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    raid5: fix a small race condition
    md-cluster: make md-cluster also can work when compiled into kernel
    raid5: guarantee enough stripes to avoid reshape hang
    raid5-cache: fix a deadlock in superblock write

    Linus Torvalds
     

10 Sep, 2016

1 commit

  • commit 5f9d1fde7d54a5(raid5: fix memory leak of bio integrity data)
    moves bio_reset to bio_endio. But it introduces a small race condition.
    It does bio_reset after raid5_release_stripe, which could make the
    stripe reusable and hence reuse the bio just before bio_reset. Moving
    bio_reset before raid5_release_stripe is called should fix the race.

    Reported-and-tested-by: Stefan Priebe - Profihost AG
    Signed-off-by: Shaohua Li

    Shaohua Li
     

09 Sep, 2016

1 commit

  • The md-cluster is compiled as module by default,
    if it is compiled by built-in way, then we can't
    make md-cluster works.

    [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors
    [64782.630528] md-cluster module not found.
    [64782.630530] md127: Could not setup cluster service (-2)

    Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions")
    Cc: stable@vger.kernel.org (v4.1+)
    Reported-by: Marc Smith
    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

07 Sep, 2016

1 commit

  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Peter Zijlstra
    Cc: Neil Brown
    Cc: linux-raid@vger.kernel.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160818125731.27256-10-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

04 Sep, 2016

1 commit

  • Pull device mapper fixes from Mike Snitzer:

    - a stable fix in both DM crypt and DM log-writes for too large bios
    (as generated by bcache)

    - two other stable fixes for DM log-writes

    - a stable fix for a DM crypt bug that could result in freeing pointers
    from uninitialized memory in the tfm allocation error path

    - a DM bufio cleanup to discontinue using create_singlethread_workqueue()

    * tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm bufio: remove use of deprecated create_singlethread_workqueue()
    dm crypt: fix free of bad values after tfm allocation failure
    dm crypt: fix error with too large bios
    dm log writes: fix check of kthread_run() return value
    dm log writes: fix bug with too large bios
    dm log writes: move IO accounting earlier to fix error path

    Linus Torvalds
     

01 Sep, 2016

2 commits

  • If there aren't enough stripes, reshape will hang. We have a check for
    this in new reshape, but miss it for reshape resume, hence we could see
    hang in reshape resume. This patch forces enough stripes existed if
    reshape resumes.

    Reviewed-by: NeilBrown
    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • There is a potential deadlock in superblock write. Discard could zero data, so
    before discard we must make sure superblock is updated to new log tail.
    Updating superblock (either directly call md_update_sb() or depend on md
    thread) must hold reconfig mutex. On the other hand, raid5_quiesce is called
    with reconfig_mutex hold. The first step of raid5_quiesce() is waitting for all
    IO finish, hence waitting for reclaim thread, while reclaim thread is calling
    this function and waitting for reconfig mutex. So there is a deadlock. We
    workaround this issue with a trylock. The downside of the solution is we could
    miss discard if we can't take reconfig mutex. But this should happen rarely
    (mainly in raid array stop), so miss discard shouldn't be a big problem.

    Cc: NeilBrown
    Signed-off-by: Shaohua Li

    Shaohua Li
     

31 Aug, 2016

7 commits

  • The workqueue "dm_bufio_wq" queues a single work item &dm_bufio_work so
    it doesn't require execution ordering. Hence, alloc_workqueue() has
    been used to replace the deprecated create_singlethread_workqueue().

    The WQ_MEM_RECLAIM flag has been set since DM requires forward progress
    under memory pressure.

    Since there are fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Signed-off-by: Bhaktipriya Shridhar
    Signed-off-by: Mike Snitzer

    Bhaktipriya Shridhar
     
  • If crypt_alloc_tfms() had to allocate multiple tfms and it failed before
    the last allocation, then it would call crypt_free_tfms() and could free
    pointers from uninitialized memory -- due to the crypt_free_tfms() check
    for non-zero cc->tfms[i]. Fix by allocating zeroed memory.

    Signed-off-by: Eric Biggers
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Eric Biggers
     
  • When dm-crypt processes writes, it allocates a new bio in
    crypt_alloc_buffer(). The bio is allocated from a bio set and it can
    have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
    larger (e.g. if it was allocated by bcache). If the incoming bio is
    larger, bio_alloc_bioset() fails and an error is returned.

    To avoid the error, we test for a too large bio in the function
    crypt_map() and use dm_accept_partial_bio() to split the bio.
    dm_accept_partial_bio() trims the current bio to the desired size and
    asks DM core to send another bio with the rest of the data.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # v3.16+

    Mikulas Patocka
     
  • The kthread_run() function returns either a valid task_struct or
    ERR_PTR() value, check for NULL is invalid. This change fixes potential
    for oops, e.g. in OOM situation.

    Signed-off-by: Vladimir Zapolskiy
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Vladimir Zapolskiy
     
  • bio_alloc() can allocate a bio with at most BIO_MAX_PAGES (256) vector
    entries. However, the incoming bio may have more vector entries if it
    was allocated by other means. For example, bcache submits bios with
    more than BIO_MAX_PAGES entries. This results in bio_alloc() failure.

    To avoid the failure, change the code so that it allocates bio with at
    most BIO_MAX_PAGES entries. If the incoming bio has more entries,
    bio_add_page() will fail and a new bio will be allocated - the code that
    handles bio_add_page() failure already exists in the dm-log-writes
    target.

    Signed-off-by: Mikulas Patocka
    Reviewed-by: Josef Bacik
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # v4.1+

    Mikulas Patocka
     
  • Move log_one_block()'s atomic_inc(&lc->io_blocks) before bio_alloc() to
    fix a bug that the target hangs if bio_alloc() fails. The error path
    does put_io_block(lc), so atomic_inc(&lc->io_blocks) must occur before
    invoking the error path to avoid underflow of lc->io_blocks.

    Signed-off-by: Mikulas Patocka
    Reviewed-by: Josef Bacik
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Mikulas Patocka
     
  • Pull MD fixes from Shaohua Li:
    "This includes several bug fixes:

    - Alexey Obitotskiy fixed a hang for faulty raid5 array with external
    management

    - Song Liu fixed two raid5 journal related bugs

    - Tomasz Majchrzak fixed a bad block recording issue and an
    accounting issue for raid10

    - ZhengYuan Liu fixed an accounting issue for raid5

    - I fixed a potential race condition and memory leak with DIF/DIX
    enabled

    - other trival fixes"

    * tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    raid5: avoid unnecessary bio data set
    raid5: fix memory leak of bio integrity data
    raid10: record correct address of bad block
    md-cluster: fix error return code in join()
    r5cache: set MD_JOURNAL_CLEAN correctly
    md: don't print the same repeated messages about delayed sync operation
    md: remove obsolete ret in md_start_sync
    md: do not count journal as spare in GET_ARRAY_INFO
    md: Prevent IO hold during accessing to faulty raid5 array
    MD: hold mddev lock to change bitmap location
    raid5: fix incorrectly counter of conf->empty_inactive_list_nr
    raid10: increment write counter after bio is split

    Linus Torvalds
     

27 Aug, 2016

2 commits

  • Pull device mapper fixes from Mike Snitzer:

    - another stable fix for DM flakey (that tweaks the previous fix that
    didn't factor in expected 'drop_writes' behavior for read IO).

    - a dm-log bio operation flags fix for the broader block changes that
    were merged during the 4.8 merge window.

    * tag 'dm-4.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm log: fix unitialized bio operation flags
    dm flakey: fix reads to be issued if drop_writes configured

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Here's a set of block fixes for the current 4.8-rc release. This
    contains:

    - a fix for a secure erase regression, from Adrian.

    - a fix for an mmc use-after-free bug regression, also from Adrian.

    - potential zero pointer deference in bdev freezing, from Andrey.

    - a race fix for blk_set_queue_dying() from Bart.

    - a set of xen blkfront fixes from Bob Liu.

    - three small fixes for bcache, from Eric and Kent.

    - a fix for a potential invalid NVMe state transition, from Gabriel.

    - blk-mq CPU offline fix, preventing us from issuing and completing a
    request on the wrong queue. From me.

    - revert two previous floppy changes, since they caused a user
    visibile regression. A better fix is in the works.

    - ensure that we don't send down bios that have more than 256
    elements in them. Fixes a crash with bcache, for example. From
    Ming.

    - a fix for deferencing an error pointer with cgroup writeback.
    Fixes a regression. From Vegard"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    mmc: fix use-after-free of struct request
    Revert "floppy: refactor open() flags handling"
    Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
    fs/block_dev: fix potential NULL ptr deref in freeze_bdev()
    blk-mq: improve warning for running a queue on the wrong CPU
    blk-mq: don't overwrite rq->mq_ctx
    block: make sure a big bio is split into at most 256 bvecs
    nvme: Fix nvme_get/set_features() with a NULL result pointer
    bdev: fix NULL pointer dereference
    xen-blkfront: free resources if xlvbd_alloc_gendisk fails
    xen-blkfront: introduce blkif_set_queue_limits()
    xen-blkfront: fix places not updated after introducing 64KB page granularity
    bcache: pr_err: more meaningful error message when nr_stripes is invalid
    bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
    bcache: register_bcache(): call blkdev_put() when cache_alloc() fails
    block: Fix race triggered by blk_set_queue_dying()
    block: Fix secure erase
    nvme: Prevent controller state invalid transition

    Linus Torvalds
     

25 Aug, 2016

7 commits

  • Commit e6047149db ("dm: use bio op accessors") switched DM over to
    using bio_set_op_attrs() but didn't take care to initialize
    lc->io_req.bi_op_flags in dm-log.c:rw_header(). This caused
    rw_header()'s call to dm_io() to make bio->bi_op_flags be uninitialized
    in dm-io.c:do_region(), which ultimately resulted in a SCSI BUG() in
    sd_init_command().

    Also, adjust rw_header() and its callers to use REQ_OP_{READ|WRITE}.

    Fixes: e6047149db ("dm: use bio op accessors")
    Signed-off-by: Heinz Mauelshagen
    Reviewed-by: Shaun Tancheff
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     
  • v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
    down_interval") overlooked the 'drop_writes' feature, which is meant to
    allow reads to be issued rather than errored, during the down_interval.

    Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
    Reported-by: Qu Wenruo
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Mike Snitzer
     
  • bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to
    set them every time. bi_private will be set before the bio is
    dispatched.

    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5
    doesn't alloc/free bio, instead it reuses bios. There are two issues in
    current code:
    1. the code calls bio_init (from
    init_stripe->raid5_build_block->bio_init) then bio_reset (ops_run_io).
    The bio is reused, so likely there is integrity data attached. bio_init
    will clear a pointer to integrity data and makes bio_reset can't release
    the data
    2. bio_reset is called before dispatching bio. After bio is finished,
    it's possible we don't free bio's integrity data (eg, we don't call
    bio_reset again)
    Both issues will cause memory leak. The patch moves bio_init to stripe
    creation and bio_reset to bio end io. This will fix the two issues.

    Reported-by: Yi Zhang
    Signed-off-by: Shaohua Li

    Shaohua Li
     
  • For failed write request record block address on a device, not block
    address in an array.

    Signed-off-by: Tomasz Majchrzak
    Signed-off-by: Shaohua Li

    Tomasz Majchrzak
     
  • Fix to return error code -ENOMEM from the lockres_init() error
    handling case instead of 0, as done elsewhere in this function.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Shaohua Li

    Wei Yongjun
     
  • Currently, the code sets MD_JOURNAL_CLEAN when the array has
    MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array
    will be MD_JOURNAL_CLEAN even if the journal device is missing.

    With this patch, the MD_JOURNAL_CLEAN is only set when the journal
    device presents.

    Signed-off-by: Song Liu
    Signed-off-by: Shaohua Li

    Song Liu