10 Jan, 2021

5 commits

  • If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
    set, it means the cache device is created with obsoleted layout with
    obso_bucket_site_hi. Now bcache does not support this feature bit, a new
    BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
    for a better layout to support large bucket size.

    For the legacy compatibility purpose, if a cache device created with
    obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
    devices attached to this cache set should be set to read-only. Then the
    dirty data can be written back to backing device before re-create the
    cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
    by the latest bcache-tools.

    This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
    when running a cache set and attach a bcache device to the cache set. If
    this bit is set,
    - When run a cache set, print an error kernel message to indicate all
    following attached bcache device will be read-only.
    - When attach a bcache device, print an error kernel message to indicate
    the attached bcache device will be read-only, and ask users to update
    to latest bcache-tools.

    Such change is only for cache device whose bucket size >= 32MB, this is
    for the zoned SSD and almost nobody uses such large bucket size at this
    moment. If you don't explicit set a large bucket size for a zoned SSD,
    such change is totally transparent to your bcache device.

    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
    was introduced into the incompat feature set. It used bucket_size_hi
    (which was added at the tail of struct cache_sb_disk) to extend current
    16bit bucket size to 32bit with existing bucket_size in struct
    cache_sb_disk.

    This is not a good idea, there are two obvious problems,
    - Bucket size is always value power of 2, if store log2(bucket size) in
    existing bucket_size of struct cache_sb_disk, it is unnecessary to add
    bucket_size_hi.
    - Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
    struct cache_sb_disk, bucket_size_hi was added after d[] which makes
    csum_set calculate an unexpected super block checksum.

    To fix the above problems, this patch introduces a new incompat feature
    bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
    means bucket_size in struct cache_sb_disk stores the order of power-of-2
    bucket size value. When user specifies a bucket size larger than 32768
    sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
    incompat feature set, and bucket_size stores log2(bucket size) more
    than store the real bucket size value.

    The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
    it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
    recognized by kernel driver for legacy compatible purpose. The previous
    bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
    and not used in bcache-tools anymore.

    For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
    bcache-tools and kernel driver still recognize the feature string and
    display it as "obso_large_bucket".

    With this change, the unnecessary extra space extend of bcache on-disk
    super block can be avoided, and csum_set() may generate expected check
    sum as well.

    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch adds the check for features which is incompatible for
    current supported feature sets.

    Now if the bcache device created by bcache-tools has features that
    current kernel doesn't support, read_super() will fail with error
    messoage. E.g. if an unsupported incompatible feature detected,
    bcache register will fail with dmesg "bcache: register_bcache() error :
    Unsupported incompatible feature found".

    Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch fixes the following typos,
    from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
    from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
    from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP

    Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Jens Axboe

    Coly Li
     
  • There is no need to reassign pdev_set_uuid in the second loop iteration,
    so move it to the place before second loop.

    Signed-off-by: Yi Li
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Yi Li
     

08 Jan, 2021

9 commits

  • Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
    something like:

    root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
    alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3

    Add the decoding for that flag.

    Fixes: 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per tagset")
    Signed-off-by: John Garry
    Signed-off-by: Jens Axboe

    John Garry
     
  • We had kernel panic, it is caused by unload module and last
    close confirmation.

    call trace:
    [1196029.743127] free_sess+0x15/0x50 [rtrs_client]
    [1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client]
    [1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
    [1196029.743130] close_rtrs+0x25/0x50 [rnbd_client]
    [1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client]
    [1196029.743132] __x64_sys_delete_module+0x190/0x260

    And in the crashdump confirmation kworker is also running.
    PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2"
    #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
    #1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
    #2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
    #3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
    #4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
    #5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
    #6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
    #7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
    #8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
    #9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]

    The problem is both code path try to close same session, which lead to
    panic.

    To fix it, just skip the sess if the refcount already drop to 0.

    Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality")
    Signed-off-by: Jack Wang
    Reviewed-by: Gioh Kim
    Signed-off-by: Jens Axboe

    Jack Wang
     
  • Adding name to the Contributors List

    Signed-off-by: Swapnil Ingle
    Acked-by: Jack Wang
    Acked-by: Danil Kipnis
    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe

    Swapnil Ingle
     
  • Since dynamically allocate sglist is used for rnbd_iu, we can't free sg
    table after send_usr_msg since the callback function (cqe.done) could
    still access the sglist.

    Otherwise KASAN reports UAF issue:

    [ 4856.600257] BUG: KASAN: use-after-free in dma_direct_unmap_sg+0x53/0x290
    [ 4856.600772] Read of size 4 at addr ffff888206af3a98 by task swapper/1/0

    [ 4856.601729] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G W 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
    [ 4856.601748] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
    [ 4856.601766] Call Trace:
    [ 4856.601785]
    [ 4856.601822] dump_stack+0x99/0xcb
    [ 4856.601856] ? dma_direct_unmap_sg+0x53/0x290
    [ 4856.601888] print_address_description.constprop.7+0x1e/0x230
    [ 4856.601913] ? freeze_kernel_threads+0x73/0x73
    [ 4856.601965] ? mark_held_locks+0x29/0xa0
    [ 4856.602019] ? dma_direct_unmap_sg+0x53/0x290
    [ 4856.602039] ? dma_direct_unmap_sg+0x53/0x290
    [ 4856.602079] kasan_report.cold.9+0x37/0x7c
    [ 4856.602188] ? mlx5_ib_post_recv+0x430/0x520 [mlx5_ib]
    [ 4856.602209] ? dma_direct_unmap_sg+0x53/0x290
    [ 4856.602256] dma_direct_unmap_sg+0x53/0x290
    [ 4856.602366] complete_rdma_req+0x188/0x4b0 [rtrs_client]
    [ 4856.602451] ? rtrs_clt_close+0x80/0x80 [rtrs_client]
    [ 4856.602535] ? mlx5_ib_poll_cq+0x48b/0x16e0 [mlx5_ib]
    [ 4856.602589] ? radix_tree_insert+0x3a0/0x3a0
    [ 4856.602610] ? do_raw_spin_lock+0x119/0x1d0
    [ 4856.602647] ? rwlock_bug.part.1+0x60/0x60
    [ 4856.602740] rtrs_clt_rdma_done+0x3f7/0x670 [rtrs_client]
    [ 4856.602804] ? rtrs_clt_rdma_cm_handler+0xda0/0xda0 [rtrs_client]
    [ 4856.602857] ? check_flags.part.31+0x6c/0x1f0
    [ 4856.602927] ? rcu_read_lock_sched_held+0xaf/0xe0
    [ 4856.602963] ? rcu_read_lock_bh_held+0xc0/0xc0
    [ 4856.603137] __ib_process_cq+0x10a/0x350 [ib_core]
    [ 4856.603309] ib_poll_handler+0x41/0x1c0 [ib_core]
    [ 4856.603358] irq_poll_softirq+0xe6/0x280
    [ 4856.603392] ? lockdep_hardirqs_on_prepare+0x111/0x210
    [ 4856.603446] __do_softirq+0x10d/0x646
    [ 4856.603540] asm_call_irq_on_stack+0x12/0x20
    [ 4856.603563]

    [ 4856.605096] Allocated by task 8914:
    [ 4856.605510] kasan_save_stack+0x19/0x40
    [ 4856.605532] __kasan_kmalloc.constprop.7+0xc1/0xd0
    [ 4856.605552] __kmalloc+0x155/0x320
    [ 4856.605574] __sg_alloc_table+0x155/0x1c0
    [ 4856.605594] sg_alloc_table+0x1f/0x50
    [ 4856.605620] send_msg_sess_info+0x119/0x2e0 [rnbd_client]
    [ 4856.605646] remap_devs+0x71/0x210 [rnbd_client]
    [ 4856.605676] init_sess+0xad8/0xe10 [rtrs_client]
    [ 4856.605706] rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
    [ 4856.605728] process_one_work+0x521/0xa90
    [ 4856.605748] worker_thread+0x65/0x5b0
    [ 4856.605769] kthread+0x1f2/0x210
    [ 4856.605789] ret_from_fork+0x22/0x30

    [ 4856.606159] Freed by task 8914:
    [ 4856.606559] kasan_save_stack+0x19/0x40
    [ 4856.606580] kasan_set_track+0x1c/0x30
    [ 4856.606601] kasan_set_free_info+0x1b/0x30
    [ 4856.606622] __kasan_slab_free+0x108/0x150
    [ 4856.606642] slab_free_freelist_hook+0x64/0x190
    [ 4856.606661] kfree+0xe2/0x650
    [ 4856.606681] __sg_free_table+0xa4/0x100
    [ 4856.606707] send_msg_sess_info+0x1d6/0x2e0 [rnbd_client]
    [ 4856.606733] remap_devs+0x71/0x210 [rnbd_client]
    [ 4856.606763] init_sess+0xad8/0xe10 [rtrs_client]
    [ 4856.606792] rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
    [ 4856.606813] process_one_work+0x521/0xa90
    [ 4856.606833] worker_thread+0x65/0x5b0
    [ 4856.606853] kthread+0x1f2/0x210
    [ 4856.606872] ret_from_fork+0x22/0x30

    The solution is to free iu's sgtable after the iu is not used anymore.
    And also move sg_alloc_table into rnbd_get_iu accordingly.

    Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe

    Guoqing Jiang
     
  • KASAN detect following BUG:
    [ 778.215311] ==================================================================
    [ 778.216696] BUG: KASAN: use-after-free in rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
    [ 778.219037] Read of size 8 at addr ffff88b1d6516c28 by task tee/8842

    [ 778.220500] CPU: 37 PID: 8842 Comm: tee Kdump: loaded Not tainted 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
    [ 778.220529] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
    [ 778.220555] Call Trace:
    [ 778.220609] dump_stack+0x99/0xcb
    [ 778.220667] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
    [ 778.220715] print_address_description.constprop.7+0x1e/0x230
    [ 778.220750] ? freeze_kernel_threads+0x73/0x73
    [ 778.220896] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
    [ 778.220932] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
    [ 778.220994] kasan_report.cold.9+0x37/0x7c
    [ 778.221066] ? kobject_put+0x80/0x270
    [ 778.221102] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
    [ 778.221184] rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
    [ 778.221240] rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
    [ 778.221304] ? sysfs_file_ops+0x90/0x90
    [ 778.221353] kernfs_fop_write+0x141/0x240
    [ 778.221451] vfs_write+0x142/0x4d0
    [ 778.221553] ksys_write+0xc0/0x160
    [ 778.221602] ? __ia32_sys_read+0x50/0x50
    [ 778.221684] ? lockdep_hardirqs_on_prepare+0x13d/0x210
    [ 778.221718] ? syscall_enter_from_user_mode+0x1c/0x50
    [ 778.221821] do_syscall_64+0x33/0x40
    [ 778.221862] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 778.221896] RIP: 0033:0x7f4affdd9504
    [ 778.221928] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
    [ 778.221956] RSP: 002b:00007fffebb36b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    [ 778.222011] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4affdd9504
    [ 778.222038] RDX: 0000000000000002 RSI: 00007fffebb36c50 RDI: 0000000000000003
    [ 778.222066] RBP: 00007fffebb36c50 R08: 0000556a151aa600 R09: 00007f4affeb1540
    [ 778.222094] R10: fffffffffffffc19 R11: 0000000000000246 R12: 0000556a151aa520
    [ 778.222121] R13: 0000000000000002 R14: 00007f4affea6760 R15: 0000000000000002

    [ 778.222764] Allocated by task 3212:
    [ 778.223285] kasan_save_stack+0x19/0x40
    [ 778.223316] __kasan_kmalloc.constprop.7+0xc1/0xd0
    [ 778.223347] kmem_cache_alloc_trace+0x186/0x350
    [ 778.223382] rnbd_srv_rdma_ev+0xf16/0x1690 [rnbd_server]
    [ 778.223422] process_io_req+0x4d1/0x670 [rtrs_server]
    [ 778.223573] __ib_process_cq+0x10a/0x350 [ib_core]
    [ 778.223709] ib_cq_poll_work+0x31/0xb0 [ib_core]
    [ 778.223743] process_one_work+0x521/0xa90
    [ 778.223773] worker_thread+0x65/0x5b0
    [ 778.223802] kthread+0x1f2/0x210
    [ 778.223833] ret_from_fork+0x22/0x30

    [ 778.224296] Freed by task 8842:
    [ 778.224800] kasan_save_stack+0x19/0x40
    [ 778.224829] kasan_set_track+0x1c/0x30
    [ 778.224860] kasan_set_free_info+0x1b/0x30
    [ 778.224889] __kasan_slab_free+0x108/0x150
    [ 778.224919] slab_free_freelist_hook+0x64/0x190
    [ 778.224947] kfree+0xe2/0x650
    [ 778.224982] rnbd_destroy_sess_dev+0x2fa/0x3b0 [rnbd_server]
    [ 778.225011] kobject_put+0xda/0x270
    [ 778.225046] rnbd_srv_sess_dev_force_close+0x30/0x60 [rnbd_server]
    [ 778.225081] rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
    [ 778.225111] kernfs_fop_write+0x141/0x240
    [ 778.225140] vfs_write+0x142/0x4d0
    [ 778.225169] ksys_write+0xc0/0x160
    [ 778.225198] do_syscall_64+0x33/0x40
    [ 778.225227] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [ 778.226506] The buggy address belongs to the object at ffff88b1d6516c00
    which belongs to the cache kmalloc-512 of size 512
    [ 778.227464] The buggy address is located 40 bytes inside of
    512-byte region [ffff88b1d6516c00, ffff88b1d6516e00)

    The problem is in the sess_dev release function we call
    rnbd_destroy_sess_dev, and could free the sess_dev already, but we still
    set the keep_id in rnbd_srv_sess_dev_force_close, which lead to use
    after free.

    To fix it, move the keep_id before the sysfs removal, and cache the
    rnbd_srv_session for lock accessing,

    Fixes: 786998050cbc ("block/rnbd-srv: close a mapped device from server side.")
    Signed-off-by: Jack Wang
    Reviewed-by: Guoqing Jiang
    Signed-off-by: Jens Axboe

    Jack Wang
     
  • lkp reboot following build error:
    drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
    >> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
    387 | sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
    | ^~~~~~~~~~~~~~~~~~~~~

    The reason is CONFIG_SG_POOL is not enabled in the config, to
    avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.

    Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
    Reported-by: kernel test robot
    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe

    Jack Wang
     
  • bdev_evict_inode and bdev_free_inode are also called for the root inode
    of bdevfs, for which bdev_alloc is never called. Move the zeroing o
    f struct block_device and the initialization of the bd_bdi field into
    bdev_alloc_inode to make sure they are initialized for the root inode
    as well.

    Fixes: e6cb53827ed6 ("block: initialize struct block_device in bdev_alloc")
    Reported-by: Alexey Kardashevskiy
    Tested-by: Alexey Kardashevskiy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Pull NVMe updates from Christoph:

    "nvme updates for 5.11:

    - fix a race in the nvme-tcp send code (Sagi Grimberg)
    - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
    - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
    - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
    - fix two compiler warnings in nvme-fcloop (James Smart)
    - don't call sleeping functions from irq context in nvme-fc (James Smart)
    - remove an unused argument (Max Gurtovoy)
    - remove unused exports (Minwoo Im)"

    * tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme:
    nvme: remove the unused status argument from nvme_trace_bio_complete
    nvmet-rdma: Fix list_del corruption on queue establishment failure
    nvme: unexport functions with no external caller
    nvme: avoid possible double fetch in handling CQE
    nvme-tcp: Fix possible race of io_work and direct send
    nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
    nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
    nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context

    Jens Axboe
     
  • freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
    whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
    bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
    bd_fsfreeze_sb.

    But this means a freeze_bdev() call followed by a thaw_bdev() call can
    leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
    zero. If freeze_bdev() is called again, and this time
    get_active_super() returns NULL (e.g. because the FS is unmounted),
    we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
    *untouched* - it stays the same (now garbage) value. A subsequent
    thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
    (since bd_fsfreeze_count > 0), and attempt to use it.

    Fix this by always setting bd_fsfreeze_sb to NULL when
    bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
    Alternatively, we could set bd_fsfreeze_sb to whatever
    get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
    is successfully incremented to 1 from 0 (which can be achieved cleanly
    by moving the line currently setting bd_fsfreeze_sb to immediately
    after the "sync:" label, but it might be a little too subtle/easily
    overlooked in future).

    This fixes the currently panicking xfstests generic/085.

    Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev")
    Signed-off-by: Satya Tangirala
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Jens Axboe

    Satya Tangirala
     

06 Jan, 2021

11 commits

  • The only used argument in this function is the "req".

    Signed-off-by: Max Gurtovoy
    Reviewed-by: Minwoo Im
    Signed-off-by: Christoph Hellwig

    Max Gurtovoy
     
  • When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some
    requests at rsp_wait_list. In case a disconnect occurs at this
    state, no one will empty this list and will return the requests to
    free_rsps list. Normally nvmet_rdma_queue_established() free those
    requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in
    this case __nvmet_rdma_queue_disconnect() is called before. The
    crash happens at nvmet_rdma_free_rsps() when calling
    list_del(&rsp->free_list), because the request exists only at
    the wait list. To fix the issue, simply clear rsp_wait_list when
    destroying the queue.

    Signed-off-by: Israel Rukshin
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Christoph Hellwig

    Israel Rukshin
     
  • There are no callers for nvme_reset_ctrl_sync() and
    nvme_alloc_request_qid() so that we keep the symbols exported.

    Unexport those functions, mark them static and update the header file
    respectively.

    Signed-off-by: Minwoo Im
    Signed-off-by: Christoph Hellwig

    Minwoo Im
     
  • While handling the completion queue, keep a local copy of the command id
    from the DMA-accessible completion entry. This silences a time-of-check
    to time-of-use (TOCTOU) warning from KF/x[1], with respect to a
    Thunderclap[2] vulnerability analysis. The double-read impact appears
    benign.

    There may be a theoretical window for @command_id to be used as an
    adversary-controlled array-index-value for mounting a speculative
    execution attack, but that mitigation is saved for a potential follow-on.
    A man-in-the-middle attack on the data payload is out of scope for this
    analysis and is hopefully mitigated by filesystem integrity mechanisms.

    [1] https://github.com/intel/kernel-fuzzer-for-xen-project
    [2] http://thunderclap.io/thunderclap-paper-ndss2019.pdf
    Signed-off-by: Lalithambika Krishna Kumar
    Signed-off-by: Christoph Hellwig

    Lalithambika Krishnakumar
     
  • We may send a request (with or without its data) from two paths:

    1. From our I/O context nvme_tcp_io_work which is triggered from:
    - queue_rq
    - r2t reception
    - socket data_ready and write_space callbacks
    2. Directly from queue_rq if the send_list is empty (because we want to
    save the context switch associated with scheduling our io_work).

    However, given that now we have the send_mutex, we may run into a race
    condition where none of these contexts will send the pending payload to
    the controller. Both io_work send path and queue_rq send path
    opportunistically attempt to acquire the send_mutex however queue_rq only
    attempts to send a single request, and if io_work context fails to
    acquire the send_mutex it will complete without rescheduling itself.

    The race can trigger with the following sequence:

    1. queue_rq sends request (no incapsule data) and blocks
    2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
    to the send_list and schedules io_work
    3. io_work triggers and cannot acquire the send_mutex - because of (1),
    ends without self rescheduling
    4. queue_rq completes the send, and completes

    ==> no context will send the h2cdata - timeout.

    Fix this by having queue_rq sending as much as it can from the send_list
    such that if it still has any left, its because the socket buffer is
    full and the socket write_space callback will trigger, thus guaranteeing
    that a context will be scheduled to send the h2cdata PDU.

    Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
    Reported-by: Potnuri Bharat Teja
    Reported-by: Samuel Jones
    Signed-off-by: Sagi Grimberg
    Tested-by: Potnuri Bharat Teja
    Signed-off-by: Christoph Hellwig

    Sagi Grimberg
     
  • A system with more than one of these SSDs will only have one usable.
    Hence the kernel fails to detect nvme devices due to duplicate cntlids.

    [ 6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
    [ 6.274566] nvme nvme1: Removing after probe failure status: -22

    Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.

    Signed-off-by: Gopal Tiwari
    Signed-off-by: Christoph Hellwig

    Gopal Tiwari
     
  • Kernel robot had the following warnings:

    >> fcloop.c:1506:6: warning: %x in format string (no. 1) requires
    >> 'unsigned int *' but the argument type is 'signed int *'.
    >> [invalidScanfArgType_int]
    >> if (sscanf(buf, "%x:%d:%d", &opcode, &starting, &amount) != 3)
    >> ^

    Resolve by changing opcode from and int to an unsigned int

    and

    >> fcloop.c:1632:32: warning: Uninitialized variable: lport [uninitvar]
    >> ret = __wait_localport_unreg(lport);
    >> ^

    >> fcloop.c:1615:28: warning: Uninitialized variable: nport [uninitvar]
    >> ret = __remoteport_unreg(nport, rport);
    >> ^

    These aren't actual issues as the values are assigned prior to use.
    It appears the tool doesn't understand list_first_entry_or_null().
    Regardless, quiet the tool by initializing the pointers to NULL at
    declaration.

    Signed-off-by: James Smart
    Signed-off-by: Christoph Hellwig

    James Smart
     
  • Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
    used to be called from a timeout or work context. Now it is being called
    in an io completion context, which can be an interrupt handler.
    Unfortunately, the abort outstanding ios routine attempts to stop nvme
    queues and nested routines that may try to sleep, which is in conflict
    with the interrupt handler.

    Correct replacing the direct call with a work element scheduling, and the
    abort outstanding ios routine will be called in the work element.

    Fixes: 95ced8a2c72d ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
    Signed-off-by: James Smart
    Reported-by: Daniel Wagner
    Tested-by: Daniel Wagner
    Signed-off-by: Christoph Hellwig

    James Smart
     
  • Make sure that bdgrab() is done on the 'block_device' instance before
    referring to it for avoiding use-after-free.

    Cc:
    Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • BFQ computes number of tags it allows to be allocated for each request type
    based on tag bitmap. However it uses 1 << bitmap.shift as number of
    available tags which is wrong. 'shift' is just an internal bitmap value
    containing logarithm of how many bits bitmap uses in each bitmap word.
    Thus number of tags allowed for some request types can be far to low.
    Use proper bitmap.depth which has the number of tags instead.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • When initializing iocost for a queue, its rqos should be registered before
    the blkcg policy is activated to allow policy data initiailization to lookup
    the associated ioc. This unfortunately means that the rqos methods can be
    called on bios before iocgs are attached to all existing blkgs.

    While the race is theoretically possible on ioc_rqos_throttle(), it mostly
    happened in ioc_rqos_merge() due to the difference in how they lookup ioc.
    The former determines it from the passed in @rqos and then bails before
    dereferencing iocg if the looked up ioc is disabled, which most likely is
    the case if initialization is still in progress. The latter looked up ioc by
    dereferencing the possibly NULL iocg making it a lot more prone to actually
    triggering the bug.

    * Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look
    up ioc for consistency.

    * Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before
    dereferencing it.

    * Explain the danger of NULL iocgs in blk_iocost_init().

    Signed-off-by: Tejun Heo
    Reported-by: Jonathan Lemon
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Jens Axboe

    Tejun Heo
     

04 Jan, 2021

2 commits

  • Without CRC32 support, this fails to link:

    arm-linux-gnueabi-ld: drivers/lightnvm/pblk-init.o: in function `pblk_init':
    pblk-init.c:(.text+0x2654): undefined reference to `crc32_le'
    arm-linux-gnueabi-ld: drivers/lightnvm/pblk-init.o: in function `pblk_exit':
    pblk-init.c:(.text+0x2a7c): undefined reference to `crc32_le'

    Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • Without crc32, the driver fails to link:

    arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config':
    config.c:(.text+0x124): undefined reference to `crc32_le'

    Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     

30 Dec, 2020

2 commits

  • This was missed in 021a24460dc2. Leads to the numeric value of
    QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
    /sys/kernel/debug/block/*/state.

    Fixes: 021a24460dc28e7412aecfae89f60e1847e685c0
    Cc: Konstantin Khlebnikov
    Cc: Mike Snitzer
    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Signed-off-by: Andres Freund
    Signed-off-by: Jens Axboe

    Andres Freund
     
  • Fix new kernel-doc warnings in fs/block_dev.c:

    ../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
    ../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'

    Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
    Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface")
    Signed-off-by: Randy Dunlap
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Alexander Viro
    Signed-off-by: Jens Axboe

    Randy Dunlap
     

28 Dec, 2020

8 commits


27 Dec, 2020

1 commit

  • Commit c9a3c4e637ac ("mfd: ab8500-debugfs: Remove extraneous curly
    brace") removed a left-over curly brace that caused build failures, but
    Joe Perches points out that the subsequent 'seq_putc()' should also be
    removed, because the commit that caused all these problems already added
    the final '\n' to the seq_printf() above it.

    Reported-by: Joe Perches
    Fixes: 886c8121659d ("mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc")
    Cc: Thomas Gleixner
    Cc: Nathan Chancellor
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Dec, 2020

2 commits

  • Pull PCI fixes from Bjorn Helgaas:

    - Fix a tegra enumeration regression (Rob Herring)

    - Fix a designware-host check that warned on *success*, not failure
    (Alexander Lobakin)

    * tag 'pci-v5.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    PCI: dwc: Fix inverted condition of DMA mask setup warning
    PCI: tegra: Fix host link initialization

    Linus Torvalds
     
  • Clang errors:

    drivers/mfd/ab8500-debugfs.c:1526:2: error: non-void function does not return a value [-Werror,-Wreturn-type]
    }
    ^
    drivers/mfd/ab8500-debugfs.c:1528:2: error: expected identifier or '('
    return 0;
    ^
    drivers/mfd/ab8500-debugfs.c:1529:1: error: extraneous closing brace ('}')
    }
    ^
    3 errors generated.

    The cleanup in ab8500_interrupts_show left a curly brace around, remove
    it to fix the error.

    Fixes: 886c8121659d ("mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc")
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Linus Torvalds

    Nathan Chancellor