17 Sep, 2020

1 commit

  • commit f44d04e696feaf13d192d942c4f14ad2e117065a upstream.

    It turns out that currently we rely only on sysfs attribute
    permissions:

    $ ll /sys/bus/rbd/{add*,remove*}
    --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add
    --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add_single_major
    --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/remove
    --w------- 1 root root 4096 Sep 3 20:38 /sys/bus/rbd/remove_single_major

    This means that images can be mapped and unmapped (i.e. block devices
    can be created and deleted) by a UID 0 process even after it drops all
    privileges or by any process with CAP_DAC_OVERRIDE in its user namespace
    as long as UID 0 is mapped into that user namespace.

    Be consistent with other virtual block devices (loop, nbd, dm, md, etc)
    and require CAP_SYS_ADMIN in the initial user namespace for mapping and
    unmapping, and also for dumping the configuration string and refreshing
    the image header.

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     

10 Sep, 2020

1 commit

  • [ Upstream commit acb19e17c5134dd78668c429ecba5b481f038e6a ]

    If we configured io timeout of nbd0 to 100s. Later after we
    finished using it, we configured nbd0 again and set the io
    timeout to 0. We expect it would timeout after 30 seconds
    and keep retry. But in fact we could not change the timeout
    when we set it to 0. the timeout is still the original 100s.

    So change the timeout to default 30s when we set it to zero.
    It also behaves same as commit 2da22da57348 ("nbd: fix zero
    cmd timeout handling v2").

    It becomes more important if we were reconfigure a nbd device
    and the io timeout it set to zero. Because it could take 30s
    to detect the new socket and thus io could be completed more
    quickly compared to 100s.

    Signed-off-by: Hou Pu
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Hou Pu
     

03 Sep, 2020

3 commits

  • commit bcb21c8cc9947286211327d663ace69f07d37a76 upstream.

    In case of block device backend, if the backend supports write zeros, the
    loop device will set queue flag of QUEUE_FLAG_DISCARD. However,
    limits.discard_granularity isn't setup, and this way is wrong,
    see the following description in Documentation/ABI/testing/sysfs-block:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

    Especially 9b15d109a6b2 ("block: improve discard bio alignment in
    __blkdev_issue_discard()") starts to take q->limits.discard_granularity
    for computing max discard sectors. And zero discard granularity may cause
    kernel oops, or fail discard request even though the loop queue claims
    discard support via QUEUE_FLAG_DISCARD.

    Fix the issue by setup discard granularity and alignment.

    Fixes: c52abf563049 ("loop: Better discard support for block devices")
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Acked-by: Coly Li
    Cc: Hannes Reinecke
    Cc: Xiao Ni
    Cc: Martin K. Petersen
    Cc: Evan Green
    Cc: Gwendal Grignou
    Cc: Chaitanya Kulkarni
    Cc: Andrzej Pietrasiewicz
    Cc: Christoph Hellwig
    Cc:
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Ming Lei
     
  • [ Upstream commit 2d62e6b038e729c3e4bfbfcfbd44800ef0883680 ]

    REQ_FUA should be checked using rq->cmd_flags instead of req_op().

    Fixes: deb78b419dfda ("nullb: emulate cache")
    Signed-off-by: Hou Pu
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Hou Pu
     
  • [ Upstream commit af822aa68fbdf0a480a17462ed70232998127453 ]

    1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") starts
    to support multi-range discard for virtio-blk. However, the virtio-blk
    disk may report max discard segment as 1, at least that is exactly what
    qemu is doing.

    So far, block layer switches to normal request merge if max discard segment
    limit is 1, and multiple bios can be merged to single segment. This way may
    cause memory corruption in virtblk_setup_discard_write_zeroes().

    Fix the issue by handling single max discard segment in straightforward
    way.

    Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Changpeng Liu
    Cc: Daniel Verkamp
    Cc: Michael S. Tsirkin
    Cc: Stefan Hajnoczi
    Cc: Stefano Garzarella
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Ming Lei
     

19 Aug, 2020

1 commit


22 Jul, 2020

1 commit

  • commit 853eab68afc80f59f36bbdeb715e5c88c501e680 upstream.

    Turns out that the permissions for 0400 really are what we want here,
    otherwise any user can read from this file.

    [fixed formatting, added changelog, and made attribute static - gregkh]

    Reported-by: Wade Mealing
    Cc: stable
    Fixes: f40609d1591f ("zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()")
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=1847832
    Reviewed-by: Steffen Maier
    Acked-by: Minchan Kim
    Link: https://lore.kernel.org/r/20200617114946.GA2131650@kroah.com
    Signed-off-by: Greg Kroah-Hartman

    Wade Mealing
     

16 Jul, 2020

1 commit

  • [ Upstream commit 579dd91ab3a5446b148e7f179b6596b270dace46 ]

    When adding first socket to nbd, if nsock's allocation failed, the data
    structure member "config->socks" was reallocated, but the data structure
    member "config->num_connections" was not updated. A memory leak will occur
    then because the function "nbd_config_put" will free "config->socks" only
    when "config->num_connections" is not zero.

    Fixes: 03bf73c315ed ("nbd: prevent memory leak")
    Reported-by: syzbot+934037347002901b8d2a@syzkaller.appspotmail.com
    Signed-off-by: Zheng Bin
    Reviewed-by: Eric Biggers
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Zheng Bin
     

09 Jul, 2020

1 commit

  • [ Upstream commit e7eea44eefbdd5f0345a0a8b80a3ca1c21030d06 ]

    Else there will be memory leak if alloc_disk() fails.

    Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device")
    Signed-off-by: Hou Tao
    Reviewed-by: Stefano Garzarella
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Hou Tao
     

01 Jul, 2020

1 commit

  • commit f4bd34b139a3fa2808c4205f12714c65e1548c6c upstream.

    When a filesystem is mounted on a loop device and on a loop ioctl
    LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting
    destroyed.
    kill_bdev
    truncate_inode_pages
    truncate_inode_pages_range
    do_invalidatepage
    block_invalidatepage
    discard_buffer -->clear BH_Mapped flag

    sb_bread
    __bread_gfp
    bh = __getblk_gfp
    -->discard_buffer clear BH_Mapped flag
    __bread_slow
    submit_bh
    submit_bh_wbc
    BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON

    Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed")
    Signed-off-by: Zheng Bin
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Zheng Bin
     

24 Jun, 2020

1 commit

  • [ Upstream commit 720bc316690bd27dea9d71510b50f0cd698ffc32 ]

    Since commit dcebd755926b ("block: use bio_for_each_bvec() to compute
    multi-page bvec count"), the kernel will bug_on on the PS3 because
    bio_split() is called with sectors == 0:

    kernel BUG at block/bio.c:1853!
    Oops: Exception in kernel mode, sig: 5 [#1]
    BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3
    Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \
    ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t
    CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1
    NIP: c00000000027d0d0 LR: c00000000027d0b0 CTR: 0000000000000000
    REGS: c00000000135ae90 TRAP: 0700 Not tainted (5.3.0-rc4)
    MSR: 8000000000028032 CR: 44008240 XER: 20000000
    IRQMASK: 0
    GPR00: c000000000289368 c00000000135b120 c00000000084a500 c000000004ff8300
    GPR04: 0000000000000c00 c000000004c905e0 c000000004c905e0 000000000000ffff
    GPR08: 0000000000000000 0000000000000001 0000000000000000 000000000000ffff
    GPR12: 0000000000000000 c0000000008ef000 000000000000003e 0000000000080001
    GPR16: 0000000000000100 000000000000ffff 0000000000000000 0000000000000004
    GPR20: c00000000062fd7e 0000000000000001 000000000000ffff 0000000000000080
    GPR24: c000000000781788 c00000000135b350 0000000000000080 c000000004c905e0
    GPR28: c00000000135b348 c000000004ff8300 0000000000000000 c000000004c90000
    NIP [c00000000027d0d0] .bio_split+0x28/0xac
    LR [c00000000027d0b0] .bio_split+0x8/0xac
    Call Trace:
    [c00000000135b120] [c00000000027d130] .bio_split+0x88/0xac (unreliable)
    [c00000000135b1b0] [c000000000289368] .__blk_queue_split+0x11c/0x53c
    [c00000000135b2d0] [c00000000028f614] .blk_mq_make_request+0x80/0x7d4
    [c00000000135b3d0] [c000000000283a8c] .generic_make_request+0x118/0x294
    [c00000000135b4b0] [c000000000283d34] .submit_bio+0x12c/0x174
    [c00000000135b580] [c000000000205a44] .mpage_bio_submit+0x3c/0x4c
    [c00000000135b600] [c000000000206184] .mpage_readpages+0xa4/0x184
    [c00000000135b750] [c0000000001ff8fc] .blkdev_readpages+0x24/0x38
    [c00000000135b7c0] [c0000000001589f0] .read_pages+0x6c/0x1a8
    [c00000000135b8b0] [c000000000158c74] .__do_page_cache_readahead+0x118/0x184
    [c00000000135b9b0] [c0000000001591a8] .force_page_cache_readahead+0xe4/0xe8
    [c00000000135ba50] [c00000000014fc24] .generic_file_read_iter+0x1d8/0x830
    [c00000000135bb50] [c0000000001ffadc] .blkdev_read_iter+0x40/0x5c
    [c00000000135bbc0] [c0000000001b9e00] .new_sync_read+0x144/0x1a0
    [c00000000135bcd0] [c0000000001bc454] .vfs_read+0xa0/0x124
    [c00000000135bd70] [c0000000001bc7a4] .ksys_read+0x70/0xd8
    [c00000000135be20] [c00000000000a524] system_call+0x5c/0x70
    Instruction dump:
    7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0
    7c7d1b78 79290fe0 7cc53378 69290001 81230028 7bca0020 7929ba62
    [ end trace 313fec760f30aa1f ]---

    The problem originates from setting the segment boundary of the
    request queue to -1UL. This makes get_max_segment_size() return zero
    when offset is zero, whatever the max segment size. The test with
    BLK_SEG_BOUNDARY_MASK fails and 'mask - (mask & offset) + 1' overflows
    to zero in the return statement.

    Not setting the segment boundary and using the default
    value (BLK_SEG_BOUNDARY_MASK) fixes the problem.

    Signed-off-by: Emmanuel Nicolet
    Signed-off-by: Geoff Levand
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/060a416c43138f45105c0540eff1a45539f7e2fc.1589049250.git.geoff@infradead.org
    Signed-off-by: Sasha Levin

    Emmanuel Nicolet
     

17 Jun, 2020

1 commit

  • commit 263c61581a38d0a5ad1f5f4a9143b27d68caeffd upstream.

    Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case
    in floppy_queue_rq() is not handled correctly.

    In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy
    locked due to another request still being in-flight), we put the request
    on the list of requests and return BLK_STS_OK to the block core, without
    actually scheduling delayed work / doing further processing of the
    request. This means that processing of this request is postponed until
    another request comes and passess uncontended.

    Which in some cases might actually never happen and we keep waiting
    indefinitely. The simple testcase is

    for i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done

    run in quemu. That reliably causes blkid eventually indefinitely hanging
    in __floppy_read_block_0() waiting for completion, as the BIO callback
    never happens, and no further IO is ever submitted on the (non-existent)
    floppy device. This was observed reliably on qemu-emulated device.

    Fix that by not queuing the request in the contended case, and return
    BLK_STS_RESOURCE instead, so that blk core handles the request
    rescheduling and let it pass properly non-contended later.

    Fixes: a9f38e1dec107a ("floppy: convert to blk-mq")
    Cc: stable@vger.kernel.org
    Tested-by: Libor Pechacek
    Signed-off-by: Jiri Kosina
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jiri Kosina
     

07 Jun, 2020

1 commit

  • [ Upstream commit e274832590211c4b1b1e807ca66fad8b5bb8b328 ]

    In null_init_zone_dev() check if the zone size is larger than device
    capacity, return error if needed.

    This also fixes the following oops :-

    null_blk: changed the number of conventional zones to 4294967295
    BUG: kernel NULL pointer dereference, address: 0000000000000010
    PGD 7d76c5067 P4D 7d76c5067 PUD 7d240c067 PMD 0
    Oops: 0002 [#1] SMP NOPTI
    CPU: 4 PID: 5508 Comm: nullbtests.sh Tainted: G OE 5.7.0-rc4lblk-fnext0
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e4
    RIP: 0010:null_init_zoned_dev+0x17a/0x27f [null_blk]
    RSP: 0018:ffffc90007007e00 EFLAGS: 00010246
    RAX: 0000000000000020 RBX: ffff8887fb3f3c00 RCX: 0000000000000007
    RDX: 0000000000000000 RSI: ffff8887ca09d688 RDI: ffff888810fea510
    RBP: 0000000000000010 R08: ffff8887ca09d688 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887c26e8000
    R13: ffffffffa05e9390 R14: 0000000000000000 R15: 0000000000000001
    FS: 00007fcb5256f740(0000) GS:ffff888810e00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000010 CR3: 000000081e8fe000 CR4: 00000000003406e0
    Call Trace:
    null_add_dev+0x534/0x71b [null_blk]
    nullb_device_power_store.cold.41+0x8/0x2e [null_blk]
    configfs_write_file+0xe6/0x150
    vfs_write+0xba/0x1e0
    ksys_write+0x5f/0xe0
    do_syscall_64+0x60/0x250
    entry_SYSCALL_64_after_hwframe+0x49/0xb3
    RIP: 0033:0x7fcb51c71840

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Chaitanya Kulkarni
     

20 May, 2020

1 commit

  • [ Upstream commit 90b5feb8c4bebc76c27fcaf3e1a0e5ca2d319e9e ]

    A userspace process holding a file descriptor to a virtio_blk device can
    still invoke block_device_operations after hot unplug. This leads to a
    use-after-free accessing vblk->vdev in virtblk_getgeo() when
    ioctl(HDIO_GETGEO) is invoked:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
    IP: [] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
    PGD 800000003a92f067 PUD 3a930067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000
    RIP: 0010:[] [] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
    RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246
    RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40
    RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680
    R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000
    FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    [] virtblk_getgeo+0x47/0x110 [virtio_blk]
    [] ? handle_mm_fault+0x39d/0x9b0
    [] blkdev_ioctl+0x1f5/0xa20
    [] block_ioctl+0x41/0x50
    [] do_vfs_ioctl+0x3a0/0x5a0
    [] SyS_ioctl+0xa1/0xc0

    A related problem is that virtblk_remove() leaks the vd_index_ida index
    when something still holds a reference to vblk->disk during hot unplug.
    This causes virtio-blk device names to be lost (vda, vdb, etc).

    Fix these issues by protecting vblk->vdev with a mutex and reference
    counting vblk so the vd_index_ida index can be removed in all cases.

    Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature")
    Reported-by: Lance Digby
    Signed-off-by: Stefan Hajnoczi
    Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Stefano Garzarella
    Signed-off-by: Sasha Levin

    Stefan Hajnoczi
     

29 Apr, 2020

2 commits

  • [ Upstream commit 3d973b2e9a625996ee997c7303cd793b9d197c65 ]

    Let's change the mapping between virtqueue_add errors to BLK_STS
    statuses, so that -ENOSPC, which indicates virtqueue full is still
    mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device
    specific resource outage is mapped to BLK_STS_RESOURCE.

    Signed-off-by: Halil Pasic
    Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Sasha Levin

    Halil Pasic
     
  • [ Upstream commit c52abf563049e787c1341cdf15c7dbe1bfbc951b ]

    If the backing device for a loop device is itself a block device,
    then mirror the "write zeroes" capabilities of the underlying
    block device into the loop device. Copy this capability into both
    max_write_zeroes_sectors and max_discard_sectors of the loop device.

    The reason for this is that REQ_OP_DISCARD on a loop device translates
    into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
    presents a consistent interface for loop devices (that discarded data
    is zeroed), regardless of the backing device type of the loop device.
    There should be no behavior change for loop devices backed by regular
    files.

    This change fixes blktest block/003, and removes an extraneous
    error print in block/013 when testing on a loop device backed
    by a block device that does not support discard.

    Signed-off-by: Evan Green
    Reviewed-by: Gwendal Grignou
    Reviewed-by: Chaitanya Kulkarni
    [used updated version of Evan's comment in loop_config_discard()]
    [moved backingq to local scope, removed redundant braces]
    Signed-off-by: Andrzej Pietrasiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Evan Green
     

23 Apr, 2020

2 commits

  • [ Upstream commit 952c48b0ed18919bff7528501e9a3fff8a24f8cd ]

    rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(),
    including rbd_dev_header_info(), which means that rbd_dev_header_info()
    isn't supposed to be called after rbd_dev_unprobe().

    However, rbd_dev_image_release() calls rbd_dev_unprobe() before
    rbd_unregister_watch(). This is racy because a header update notify
    can sneak in:

    "rbd unmap" thread ceph-watch-notify worker

    rbd_dev_image_release()
    rbd_dev_unprobe()
    free and zero out header
    rbd_watch_cb()
    rbd_dev_refresh()
    rbd_dev_header_info()
    read in header

    The same goes for "rbd map" because rbd_dev_image_probe() calls
    rbd_dev_unprobe() on errors. In both cases this results in a memory
    leak.

    Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jason Dillaman
    Signed-off-by: Sasha Levin

    Ilya Dryomov
     
  • [ Upstream commit 0e4e1de5b63fa423b13593337a27fd2d2b0bcf77 ]

    rbd_unregister_watch() flushes notifies and therefore cannot be called
    under header_rwsem because a header update notify takes header_rwsem to
    synchronize with "rbd map". If mapping an image fails after the watch
    is established and a header update notify sneaks in, we deadlock when
    erroring out from rbd_dev_image_probe().

    Move watch registration and unregistration out of the critical section.
    The only reason they were put there was to make header_rwsem management
    slightly more obvious.

    Fixes: 811c66887746 ("rbd: fix rbd map vs notify races")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jason Dillaman
    Signed-off-by: Sasha Levin

    Ilya Dryomov
     

17 Apr, 2020

4 commits

  • commit 3a169c0be75b59dd85d159493634870cdec6d3c4 upstream.

    Commit 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for
    large array allocation") didn't fix the issue it was meant to, as the
    flags for allocating the memory are GFP_NOIO, which will lead the
    memory allocation falling back to kmalloc().

    So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation
    in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section.

    Fixes: 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation")
    Cc: stable@vger.kernel.org
    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Acked-by: Roger Pau Monné
    Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Juergen Gross
     
  • [ Upstream commit ff77042296d0a54535ddf74412c5ae92cb4ec76a ]

    Steps to reproduce:

    BLKRESETZONE zone 0

    // force EIO
    pwrite(fd, buf, 4096, 4096);

    [issue more IO including zone ioctls]

    It will start failing randomly including IO to unrelated zones because of
    ->error "reuse". Trigger can be partition detection as well if test is not
    run immediately which is even more entertaining.

    The fix is of course to clear ->error where necessary.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alexey Dobriyan (SK hynix)
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Alexey Dobriyan
     
  • [ Upstream commit 9b03b713082a31a5b90e0a893c72aa620e255c26 ]

    If null_add_dev() fails then null_del_dev() is called with a NULL argument.
    Make null_del_dev() handle this scenario correctly. This patch fixes the
    following KASAN complaint:

    null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
    Read of size 8 at addr 0000000000000000 by task find/1062

    Call Trace:
    dump_stack+0xa5/0xe6
    __kasan_report.cold+0x65/0x99
    kasan_report+0x16/0x20
    __asan_load8+0x58/0x90
    null_del_dev+0x28/0x280 [null_blk]
    nullb_group_drop_item+0x7e/0xa0 [null_blk]
    client_drop_item+0x53/0x80 [configfs]
    configfs_rmdir+0x395/0x4e0 [configfs]
    vfs_rmdir+0xb6/0x220
    do_rmdir+0x238/0x2c0
    __x64_sys_unlinkat+0x75/0x90
    do_syscall_64+0x6f/0x2f0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Signed-off-by: Bart Van Assche
    Reviewed-by: Chaitanya Kulkarni
    Cc: Johannes Thumshirn
    Cc: Hannes Reinecke
    Cc: Ming Lei
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Bart Van Assche
     
  • [ Upstream commit 2004bfdef945fe55196db6b9cdf321fbc75bb0de ]

    If null_add_dev() fails, clear dev->nullb.

    This patch fixes the following KASAN complaint:

    BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
    Read of size 8 at addr ffff88803280fc30 by task check/8409

    Call Trace:
    dump_stack+0xa5/0xe6
    print_address_description.constprop.0+0x26/0x260
    __kasan_report.cold+0x7b/0x99
    kasan_report+0x16/0x20
    __asan_load8+0x58/0x90
    nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
    configfs_write_file+0x1c4/0x250 [configfs]
    __vfs_write+0x4c/0x90
    vfs_write+0x145/0x2c0
    ksys_write+0xd7/0x180
    __x64_sys_write+0x47/0x50
    do_syscall_64+0x6f/0x2f0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7ff370926317
    Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
    RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
    RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
    RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
    R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
    R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0

    Allocated by task 8409:
    save_stack+0x23/0x90
    __kasan_kmalloc.constprop.0+0xcf/0xe0
    kasan_kmalloc+0xd/0x10
    kmem_cache_alloc_node_trace+0x129/0x4c0
    null_add_dev+0x24a/0xe90 [null_blk]
    nullb_device_power_store+0x1b6/0x270 [null_blk]
    configfs_write_file+0x1c4/0x250 [configfs]
    __vfs_write+0x4c/0x90
    vfs_write+0x145/0x2c0
    ksys_write+0xd7/0x180
    __x64_sys_write+0x47/0x50
    do_syscall_64+0x6f/0x2f0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 8409:
    save_stack+0x23/0x90
    __kasan_slab_free+0x112/0x160
    kasan_slab_free+0x12/0x20
    kfree+0xdf/0x250
    null_add_dev+0xaf3/0xe90 [null_blk]
    nullb_device_power_store+0x1b6/0x270 [null_blk]
    configfs_write_file+0x1c4/0x250 [configfs]
    __vfs_write+0x4c/0x90
    vfs_write+0x145/0x2c0
    ksys_write+0xd7/0x180
    __x64_sys_write+0x47/0x50
    do_syscall_64+0x6f/0x2f0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: 2984c8684f96 ("nullb: factor disk parameters")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Chaitanya Kulkarni
    Cc: Johannes Thumshirn
    Cc: Hannes Reinecke
    Cc: Ming Lei
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Bart Van Assche
     

18 Mar, 2020

1 commit

  • commit f5f6b95c72f7f8bb46eace8c5306c752d0133daa upstream.

    Since nobody else is going to restart our hw_queue for us, the
    blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
    necessarily sufficient to ensure that the queue will get started again.
    In case of global resource outage (-ENOMEM because mapping failure,
    because of swiotlb full) our virtqueue may be empty and we can get
    stuck with a stopped hw_queue.

    Let us not stop the queue on arbitrary errors, but only on -EONSPC which
    indicates a full virtqueue, where the hw_queue is guaranteed to get
    started by virtblk_done() before when it makes sense to carry on
    submitting requests. Let us also remove a stale comment.

    Signed-off-by: Halil Pasic
    Cc: Jens Axboe
    Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
    Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Greg Kroah-Hartman

    Halil Pasic
     

29 Feb, 2020

1 commit

  • commit 2e90ca68b0d2f5548804f22f0dd61145516171e3 upstream.

    Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
    wait_til_ready().

    Which on the face of it can't happen, since as Willy Tarreau points out,
    the function does no particular memory access. Except through the FDCS
    macro, which just indexes a static allocation through teh current fdc,
    which is always checked against N_FDC.

    Except the checking happens after we've already assigned the value.

    The floppy driver is a disgrace (a lot of it going back to my original
    horrd "design"), and has no real maintainer. Nobody has the hardware,
    and nobody really cares. But it still gets used in virtual environment
    because it's one of those things that everybody supports.

    The whole thing should be re-written, or at least parts of it should be
    seriously cleaned up. The 'current fdc' index, which is used by the
    FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
    prime example of how not to write code.

    But because nobody has the hardware or the motivation, let's just fix up
    the immediate problem with a nasty band-aid: test the fdc index before
    actually assigning it to the static 'fdc' variable.

    Reported-by: Jordy Zomer
    Cc: Willy Tarreau
    Cc: Dan Carpenter
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

24 Feb, 2020

4 commits

  • [ Upstream commit c8ab422553c81a0eb070329c63725df1cd1425bc ]

    In brd_init func, rd_nr num of brd_device are firstly allocated
    and add in brd_devices, then brd_devices are traversed to add each
    brd_device by calling add_disk func. When allocating brd_device,
    the disk->first_minor is set to i * max_part, if rd_nr * max_part
    is larger than MINORMASK, two different brd_device may have the same
    devt, then only one of them can be successfully added.
    when rmmod brd.ko, it will cause oops when calling brd_exit.

    Follow those steps:
    # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
    # rmmod brd
    then, the oops will appear.

    Oops log:
    [ 726.613722] Call trace:
    [ 726.614175] kernfs_find_ns+0x24/0x130
    [ 726.614852] kernfs_find_and_get_ns+0x44/0x68
    [ 726.615749] sysfs_remove_group+0x38/0xb0
    [ 726.616520] blk_trace_remove_sysfs+0x1c/0x28
    [ 726.617320] blk_unregister_queue+0x98/0x100
    [ 726.618105] del_gendisk+0x144/0x2b8
    [ 726.618759] brd_exit+0x68/0x560 [brd]
    [ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0
    [ 726.620384] el0_svc_common+0x78/0x130
    [ 726.621057] el0_svc_handler+0x38/0x78
    [ 726.621738] el0_svc+0x8/0xc
    [ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)

    Here, we add brd_check_and_reset_par func to check and limit max_part par.

    --
    V5->V6:
    - remove useless code

    V4->V5:(suggested by Ming Lei)
    - make sure max_part is not larger than DISK_MAX_PARTS

    V3->V4:(suggested by Ming Lei)
    - remove useless change
    - add one limit of max_part

    V2->V3: (suggested by Ming Lei)
    - clear .minors when running out of consecutive minor space in brd_alloc
    - remove limit of rd_nr

    V1->V2:
    - add more checks in brd_check_par_valid as suggested by Ming Lei.

    Signed-off-by: Zhiqiang Liu
    Reviewed-by: Bob Liu
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Zhiqiang Liu
     
  • [ Upstream commit a55e601b2f02df5db7070e9a37bd655c9c576a52 ]

    gcc -O3 warns about a dummy variable that is passed
    down into rbd_img_fill_nodata without being initialized:

    drivers/block/rbd.c: In function 'rbd_img_fill_nodata':
    drivers/block/rbd.c:2573:13: error: 'dummy' is used uninitialized in this function [-Werror=uninitialized]
    fctx->iter = *fctx->pos;

    Since this is a dummy, I assume the warning is harmless, but
    it's better to initialize it anyway and avoid the warning.

    Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Sasha Levin

    Arnd Bergmann
     
  • [ Upstream commit 3b82a051c10143639a378dcd12019f2353cc9054 ]

    Currently when an error code -EIO or -ENOSPC in the for-loop of
    writeback_store the error code is being overwritten by a ret = len
    assignment at the end of the function and the error codes are being
    lost. Fix this by assigning ret = len at the start of the function and
    remove the assignment from the end, hence allowing ret to be preserved
    when error codes are assigned to it.

    Addresses Coverity ("Unused value")

    Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
    Fixes: a939888ec38b ("zram: support idle/huge page writeback")
    Signed-off-by: Colin Ian King
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Colin Ian King
     
  • [ Upstream commit 5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80 ]

    When kzalloc fail, may cause trying to destroy the
    workqueue from inside the workqueue.

    If num_connections is m (2 < m), and NO.1 ~ NO.n
    (1 < n < m) kzalloc are successful. The NO.(n + 1)
    failed. Then, nbd_start_device will return ENOMEM
    to nbd_start_device_ioctl, and nbd_start_device_ioctl
    will return immediately without running flush_workqueue.
    However, we still have n recv threads. If nbd_release
    run first, recv threads may have to drop the last
    config_refs and try to destroy the workqueue from
    inside the workqueue.

    To fix it, add a flush_workqueue in nbd_start_device.

    Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
    Signed-off-by: Sun Ke
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Sun Ke
     

23 Jan, 2020

1 commit

  • commit 589b72894f53124a39d1bb3c0cecaf9dcabac417 upstream.

    Clang warns:

    ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
    statement is not part of the previous 'if' [-Wmisleading-indentation]
    nr_parts = PARTS_PER_DISK;
    ^
    ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
    if (err)
    ^

    This is because there is a space at the beginning of this line; remove
    it so that the indentation is consistent according to the Linux kernel
    coding style and clang no longer warns.

    While we are here, the previous line has some trailing whitespace; clean
    that up as well.

    Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
    Link: https://github.com/ClangBuiltLinux/linux/issues/791
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Juergen Gross
    Acked-by: Roger Pau Monné
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Nathan Chancellor
     

09 Jan, 2020

2 commits

  • [ Upstream commit f9bd84a8a845d82f9b5a081a7ae68c98a11d2e84 ]

    For each I/O request, blkback first maps the foreign pages for the
    request to its local pages. If an allocation of a local page for the
    mapping fails, it should unmap every mapping already made for the
    request.

    However, blkback's handling mechanism for the allocation failure does
    not mark the remaining foreign pages as unmapped. Therefore, the unmap
    function merely tries to unmap every valid grant page for the request,
    including the pages not mapped due to the allocation failure. On a
    system that fails the allocation frequently, this problem leads to
    following kernel crash.

    [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
    [ 372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40
    [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
    [ 372.012562] Oops: 0002 [#1] SMP
    [ 372.012566] Modules linked in: act_police sch_ingress cls_u32
    ...
    [ 372.012746] Call Trace:
    [ 372.012752] [] gnttab_unmap_refs+0x34/0x40
    [ 372.012759] [] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
    ...
    [ 372.012802] [] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
    ...
    Decompressing Linux... Parsing ELF... done.
    Booting the kernel.
    [ 0.000000] Initializing cgroup subsys cpuset

    This commit fixes this problem by marking the grant pages of the given
    request that didn't mapped due to the allocation failure as invalid.

    Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

    Reviewed-by: David Woodhouse
    Reviewed-by: Maximilian Heyne
    Reviewed-by: Paul Durrant
    Reviewed-by: Roger Pau Monné
    Signed-off-by: SeongJae Park
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    SeongJae Park
     
  • [ Upstream commit fa2ac657f9783f0891b2935490afe9a7fd29d3fa ]

    Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
    cache. This cache is destoyed when xen-blkif is unloaded so it is
    necessary to wait for the deferred free routine used for such objects to
    complete. This necessity was missed in commit 14855954f636 "xen-blkback:
    allow module to be cleanly unloaded". This patch fixes the problem by
    taking/releasing extra module references in xen_blkif_alloc/free()
    respectively.

    Signed-off-by: Paul Durrant
    Reviewed-by: Roger Pau Monné
    Signed-off-by: Juergen Gross
    Signed-off-by: Sasha Levin

    Paul Durrant
     

31 Dec, 2019

2 commits

  • commit 1c05839aa973cfae8c3db964a21f9c0eef8fcc21 upstream.

    This fixes a regression added with:

    commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
    Author: Mike Christie
    Date: Sun Aug 4 14:10:06 2019 -0500

    nbd: fix max number of supported devs

    where we can deadlock during device shutdown. The problem occurs if
    the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
    returned and the userspace app has droppped its reference via closing
    the device and running nbd_release. The recv_work nbd_config_put call
    would then drop the refcount to zero and try to destroy the config which
    would try to do destroy_workqueue from the recv work.

    This patch just has nbd_start_device_ioctl do a flush_workqueue when it
    wakes so we know after the ioctl returns running works have exited. This
    also fixes a possible race where we could try to reuse the device while
    old recv_works are still running.

    Cc: stable@vger.kernel.org
    Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Mike Christie
     
  • [ Upstream commit efcfec579f6139528c9e6925eca2bc4a36da65c6 ]

    Currently, if the loop device receives a WRITE_ZEROES request, it asks
    the underlying filesystem to punch out the range. This behavior is
    correct if unmapping is allowed. However, a NOUNMAP request means that
    the caller doesn't want us to free the storage backing the range, so
    punching out the range is incorrect behavior.

    To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
    underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
    the fallocate documentation) required to ensure that the entire range is
    backed by real storage, which suffices for our purposes.

    Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Darrick J. Wong
     

29 Nov, 2019

1 commit

  • commit 03bf73c315edca28f47451913177e14cd040a216 upstream.

    In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
    reallocted memory is leak. The correct behaviour should be assigning the
    reallocted memory to config->socks right after success.

    Reviewed-by: Josef Bacik
    Signed-off-by: Navid Emamdoost
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Navid Emamdoost
     

22 Nov, 2019

1 commit


20 Nov, 2019

1 commit

  • Before returning NULL, put the sock first.

    Cc: stable@vger.kernel.org
    Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
    Reviewed-by: Josef Bacik
    Reviewed-by: Mike Christie
    Signed-off-by: Sun Ke
    Signed-off-by: Jens Axboe

    Sun Ke
     

16 Nov, 2019

1 commit

  • Pull block fixes from Jens Axboe:
    "A few fixes that should make it into this release. This contains:

    - io_uring:
    - The timeout command assumes sequence == 0 means that we want
    one completion, but this kind of overloading is unfortunate as
    it prevents users from doing a pure time based wait. Since
    this operation was introduced in this cycle, let's correct it
    now, while we can. (me)
    - One-liner to fix an issue with dependent links and fixed
    buffer reads. The actual IO completed fine, but the link got
    severed since we stored the wrong expected value. (me)
    - Add TIMEOUT to list of opcodes that don't need a file. (Pavel)

    - rsxx missing workqueue destry calls. Old bug. (Chuhong)

    - Fix blk-iocost active list check (Jiufei)

    - Fix impossible-to-hit overflow merge condition, that still hit some
    folks very rarely (Junichi)

    - Fix bfq hang issue from 5.3. This didn't get marked for stable, but
    will go into stable post this merge (Paolo)"

    * tag 'for-linus-20191115' of git://git.kernel.dk/linux-block:
    rsxx: add missed destroy_workqueue calls in remove
    iocost: check active_list of all the ancestors in iocg_activate()
    block, bfq: deschedule empty bfq_queues not referred by any process
    io_uring: ensure registered buffer import returns the IO length
    io_uring: Fix getting file for timeout
    block: check bi_size overflow before merge
    io_uring: make timeout sequence == 0 mean no sequence

    Linus Torvalds
     

15 Nov, 2019

2 commits

  • The driver misses calling destroy_workqueue in remove like what is done
    when probe fails.
    Add the missed calls to fix it.

    Signed-off-by: Chuhong Yuan
    Signed-off-by: Jens Axboe

    Chuhong Yuan
     
  • Some versions of gcc (so far 6.3 and 7.4) throw a warning:

    drivers/block/rbd.c: In function 'rbd_object_map_callback':
    drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized]
    (current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN))
    drivers/block/rbd.c:2092:23: note: 'current_state' was declared here
    u8 state, new_state, current_state;
    ^~~~~~~~~~~~~

    It's bogus because all current_state accesses are guarded by
    has_current_state.

    Reported-by: kbuild test robot
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Dongsheng Yang

    Ilya Dryomov
     

08 Nov, 2019

1 commit