17 Sep, 2020
1 commit
-
commit f44d04e696feaf13d192d942c4f14ad2e117065a upstream.
It turns out that currently we rely only on sysfs attribute
permissions:$ ll /sys/bus/rbd/{add*,remove*}
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add_single_major
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/remove
--w------- 1 root root 4096 Sep 3 20:38 /sys/bus/rbd/remove_single_majorThis means that images can be mapped and unmapped (i.e. block devices
can be created and deleted) by a UID 0 process even after it drops all
privileges or by any process with CAP_DAC_OVERRIDE in its user namespace
as long as UID 0 is mapped into that user namespace.Be consistent with other virtual block devices (loop, nbd, dm, md, etc)
and require CAP_SYS_ADMIN in the initial user namespace for mapping and
unmapping, and also for dumping the configuration string and refreshing
the image header.Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Signed-off-by: Greg Kroah-Hartman
10 Sep, 2020
1 commit
-
[ Upstream commit acb19e17c5134dd78668c429ecba5b481f038e6a ]
If we configured io timeout of nbd0 to 100s. Later after we
finished using it, we configured nbd0 again and set the io
timeout to 0. We expect it would timeout after 30 seconds
and keep retry. But in fact we could not change the timeout
when we set it to 0. the timeout is still the original 100s.So change the timeout to default 30s when we set it to zero.
It also behaves same as commit 2da22da57348 ("nbd: fix zero
cmd timeout handling v2").It becomes more important if we were reconfigure a nbd device
and the io timeout it set to zero. Because it could take 30s
to detect the new socket and thus io could be completed more
quickly compared to 100s.Signed-off-by: Hou Pu
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
03 Sep, 2020
3 commits
-
commit bcb21c8cc9947286211327d663ace69f07d37a76 upstream.
In case of block device backend, if the backend supports write zeros, the
loop device will set queue flag of QUEUE_FLAG_DISCARD. However,
limits.discard_granularity isn't setup, and this way is wrong,
see the following description in Documentation/ABI/testing/sysfs-block:A discard_granularity of 0 means that the device does not support
discard functionality.Especially 9b15d109a6b2 ("block: improve discard bio alignment in
__blkdev_issue_discard()") starts to take q->limits.discard_granularity
for computing max discard sectors. And zero discard granularity may cause
kernel oops, or fail discard request even though the loop queue claims
discard support via QUEUE_FLAG_DISCARD.Fix the issue by setup discard granularity and alignment.
Fixes: c52abf563049 ("loop: Better discard support for block devices")
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Acked-by: Coly Li
Cc: Hannes Reinecke
Cc: Xiao Ni
Cc: Martin K. Petersen
Cc: Evan Green
Cc: Gwendal Grignou
Cc: Chaitanya Kulkarni
Cc: Andrzej Pietrasiewicz
Cc: Christoph Hellwig
Cc:
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 2d62e6b038e729c3e4bfbfcfbd44800ef0883680 ]
REQ_FUA should be checked using rq->cmd_flags instead of req_op().
Fixes: deb78b419dfda ("nullb: emulate cache")
Signed-off-by: Hou Pu
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit af822aa68fbdf0a480a17462ed70232998127453 ]
1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") starts
to support multi-range discard for virtio-blk. However, the virtio-blk
disk may report max discard segment as 1, at least that is exactly what
qemu is doing.So far, block layer switches to normal request merge if max discard segment
limit is 1, and multiple bios can be merged to single segment. This way may
cause memory corruption in virtblk_setup_discard_write_zeroes().Fix the issue by handling single max discard segment in straightforward
way.Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Cc: Changpeng Liu
Cc: Daniel Verkamp
Cc: Michael S. Tsirkin
Cc: Stefan Hajnoczi
Cc: Stefano Garzarella
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
19 Aug, 2020
1 commit
-
[ Upstream commit 200f93377220504c5e56754823e7adfea6037f1a ]
Be pedantic on removal as well and hold the mutex.
This should prevent uses of addition while we exit.Signed-off-by: Luis Chamberlain
Reviewed-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
22 Jul, 2020
1 commit
-
commit 853eab68afc80f59f36bbdeb715e5c88c501e680 upstream.
Turns out that the permissions for 0400 really are what we want here,
otherwise any user can read from this file.[fixed formatting, added changelog, and made attribute static - gregkh]
Reported-by: Wade Mealing
Cc: stable
Fixes: f40609d1591f ("zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1847832
Reviewed-by: Steffen Maier
Acked-by: Minchan Kim
Link: https://lore.kernel.org/r/20200617114946.GA2131650@kroah.com
Signed-off-by: Greg Kroah-Hartman
16 Jul, 2020
1 commit
-
[ Upstream commit 579dd91ab3a5446b148e7f179b6596b270dace46 ]
When adding first socket to nbd, if nsock's allocation failed, the data
structure member "config->socks" was reallocated, but the data structure
member "config->num_connections" was not updated. A memory leak will occur
then because the function "nbd_config_put" will free "config->socks" only
when "config->num_connections" is not zero.Fixes: 03bf73c315ed ("nbd: prevent memory leak")
Reported-by: syzbot+934037347002901b8d2a@syzkaller.appspotmail.com
Signed-off-by: Zheng Bin
Reviewed-by: Eric Biggers
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
09 Jul, 2020
1 commit
-
[ Upstream commit e7eea44eefbdd5f0345a0a8b80a3ca1c21030d06 ]
Else there will be memory leak if alloc_disk() fails.
Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device")
Signed-off-by: Hou Tao
Reviewed-by: Stefano Garzarella
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
01 Jul, 2020
1 commit
-
commit f4bd34b139a3fa2808c4205f12714c65e1548c6c upstream.
When a filesystem is mounted on a loop device and on a loop ioctl
LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting
destroyed.
kill_bdev
truncate_inode_pages
truncate_inode_pages_range
do_invalidatepage
block_invalidatepage
discard_buffer -->clear BH_Mapped flagsb_bread
__bread_gfp
bh = __getblk_gfp
-->discard_buffer clear BH_Mapped flag
__bread_slow
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ONFixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed")
Signed-off-by: Zheng Bin
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
24 Jun, 2020
1 commit
-
[ Upstream commit 720bc316690bd27dea9d71510b50f0cd698ffc32 ]
Since commit dcebd755926b ("block: use bio_for_each_bvec() to compute
multi-page bvec count"), the kernel will bug_on on the PS3 because
bio_split() is called with sectors == 0:kernel BUG at block/bio.c:1853!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3
Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \
ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t
CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1
NIP: c00000000027d0d0 LR: c00000000027d0b0 CTR: 0000000000000000
REGS: c00000000135ae90 TRAP: 0700 Not tainted (5.3.0-rc4)
MSR: 8000000000028032 CR: 44008240 XER: 20000000
IRQMASK: 0
GPR00: c000000000289368 c00000000135b120 c00000000084a500 c000000004ff8300
GPR04: 0000000000000c00 c000000004c905e0 c000000004c905e0 000000000000ffff
GPR08: 0000000000000000 0000000000000001 0000000000000000 000000000000ffff
GPR12: 0000000000000000 c0000000008ef000 000000000000003e 0000000000080001
GPR16: 0000000000000100 000000000000ffff 0000000000000000 0000000000000004
GPR20: c00000000062fd7e 0000000000000001 000000000000ffff 0000000000000080
GPR24: c000000000781788 c00000000135b350 0000000000000080 c000000004c905e0
GPR28: c00000000135b348 c000000004ff8300 0000000000000000 c000000004c90000
NIP [c00000000027d0d0] .bio_split+0x28/0xac
LR [c00000000027d0b0] .bio_split+0x8/0xac
Call Trace:
[c00000000135b120] [c00000000027d130] .bio_split+0x88/0xac (unreliable)
[c00000000135b1b0] [c000000000289368] .__blk_queue_split+0x11c/0x53c
[c00000000135b2d0] [c00000000028f614] .blk_mq_make_request+0x80/0x7d4
[c00000000135b3d0] [c000000000283a8c] .generic_make_request+0x118/0x294
[c00000000135b4b0] [c000000000283d34] .submit_bio+0x12c/0x174
[c00000000135b580] [c000000000205a44] .mpage_bio_submit+0x3c/0x4c
[c00000000135b600] [c000000000206184] .mpage_readpages+0xa4/0x184
[c00000000135b750] [c0000000001ff8fc] .blkdev_readpages+0x24/0x38
[c00000000135b7c0] [c0000000001589f0] .read_pages+0x6c/0x1a8
[c00000000135b8b0] [c000000000158c74] .__do_page_cache_readahead+0x118/0x184
[c00000000135b9b0] [c0000000001591a8] .force_page_cache_readahead+0xe4/0xe8
[c00000000135ba50] [c00000000014fc24] .generic_file_read_iter+0x1d8/0x830
[c00000000135bb50] [c0000000001ffadc] .blkdev_read_iter+0x40/0x5c
[c00000000135bbc0] [c0000000001b9e00] .new_sync_read+0x144/0x1a0
[c00000000135bcd0] [c0000000001bc454] .vfs_read+0xa0/0x124
[c00000000135bd70] [c0000000001bc7a4] .ksys_read+0x70/0xd8
[c00000000135be20] [c00000000000a524] system_call+0x5c/0x70
Instruction dump:
7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0
7c7d1b78 79290fe0 7cc53378 69290001 81230028 7bca0020 7929ba62
[ end trace 313fec760f30aa1f ]---The problem originates from setting the segment boundary of the
request queue to -1UL. This makes get_max_segment_size() return zero
when offset is zero, whatever the max segment size. The test with
BLK_SEG_BOUNDARY_MASK fails and 'mask - (mask & offset) + 1' overflows
to zero in the return statement.Not setting the segment boundary and using the default
value (BLK_SEG_BOUNDARY_MASK) fixes the problem.Signed-off-by: Emmanuel Nicolet
Signed-off-by: Geoff Levand
Signed-off-by: Michael Ellerman
Link: https://lore.kernel.org/r/060a416c43138f45105c0540eff1a45539f7e2fc.1589049250.git.geoff@infradead.org
Signed-off-by: Sasha Levin
17 Jun, 2020
1 commit
-
commit 263c61581a38d0a5ad1f5f4a9143b27d68caeffd upstream.
Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case
in floppy_queue_rq() is not handled correctly.In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy
locked due to another request still being in-flight), we put the request
on the list of requests and return BLK_STS_OK to the block core, without
actually scheduling delayed work / doing further processing of the
request. This means that processing of this request is postponed until
another request comes and passess uncontended.Which in some cases might actually never happen and we keep waiting
indefinitely. The simple testcase isfor i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done
run in quemu. That reliably causes blkid eventually indefinitely hanging
in __floppy_read_block_0() waiting for completion, as the BIO callback
never happens, and no further IO is ever submitted on the (non-existent)
floppy device. This was observed reliably on qemu-emulated device.Fix that by not queuing the request in the contended case, and return
BLK_STS_RESOURCE instead, so that blk core handles the request
rescheduling and let it pass properly non-contended later.Fixes: a9f38e1dec107a ("floppy: convert to blk-mq")
Cc: stable@vger.kernel.org
Tested-by: Libor Pechacek
Signed-off-by: Jiri Kosina
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
07 Jun, 2020
1 commit
-
[ Upstream commit e274832590211c4b1b1e807ca66fad8b5bb8b328 ]
In null_init_zone_dev() check if the zone size is larger than device
capacity, return error if needed.This also fixes the following oops :-
null_blk: changed the number of conventional zones to 4294967295
BUG: kernel NULL pointer dereference, address: 0000000000000010
PGD 7d76c5067 P4D 7d76c5067 PUD 7d240c067 PMD 0
Oops: 0002 [#1] SMP NOPTI
CPU: 4 PID: 5508 Comm: nullbtests.sh Tainted: G OE 5.7.0-rc4lblk-fnext0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e4
RIP: 0010:null_init_zoned_dev+0x17a/0x27f [null_blk]
RSP: 0018:ffffc90007007e00 EFLAGS: 00010246
RAX: 0000000000000020 RBX: ffff8887fb3f3c00 RCX: 0000000000000007
RDX: 0000000000000000 RSI: ffff8887ca09d688 RDI: ffff888810fea510
RBP: 0000000000000010 R08: ffff8887ca09d688 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887c26e8000
R13: ffffffffa05e9390 R14: 0000000000000000 R15: 0000000000000001
FS: 00007fcb5256f740(0000) GS:ffff888810e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000081e8fe000 CR4: 00000000003406e0
Call Trace:
null_add_dev+0x534/0x71b [null_blk]
nullb_device_power_store.cold.41+0x8/0x2e [null_blk]
configfs_write_file+0xe6/0x150
vfs_write+0xba/0x1e0
ksys_write+0x5f/0xe0
do_syscall_64+0x60/0x250
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7fcb51c71840Signed-off-by: Chaitanya Kulkarni
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
20 May, 2020
1 commit
-
[ Upstream commit 90b5feb8c4bebc76c27fcaf3e1a0e5ca2d319e9e ]
A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug. This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
PGD 800000003a92f067 PUD 3a930067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000
RIP: 0010:[] [] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246
RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40
RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680
R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000
FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[] virtblk_getgeo+0x47/0x110 [virtio_blk]
[] ? handle_mm_fault+0x39d/0x9b0
[] blkdev_ioctl+0x1f5/0xa20
[] block_ioctl+0x41/0x50
[] do_vfs_ioctl+0x3a0/0x5a0
[] SyS_ioctl+0xa1/0xc0A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature")
Reported-by: Lance Digby
Signed-off-by: Stefan Hajnoczi
Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Stefano Garzarella
Signed-off-by: Sasha Levin
29 Apr, 2020
2 commits
-
[ Upstream commit 3d973b2e9a625996ee997c7303cd793b9d197c65 ]
Let's change the mapping between virtqueue_add errors to BLK_STS
statuses, so that -ENOSPC, which indicates virtqueue full is still
mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device
specific resource outage is mapped to BLK_STS_RESOURCE.Signed-off-by: Halil Pasic
Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Sasha Levin -
[ Upstream commit c52abf563049e787c1341cdf15c7dbe1bfbc951b ]
If the backing device for a loop device is itself a block device,
then mirror the "write zeroes" capabilities of the underlying
block device into the loop device. Copy this capability into both
max_write_zeroes_sectors and max_discard_sectors of the loop device.The reason for this is that REQ_OP_DISCARD on a loop device translates
into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
presents a consistent interface for loop devices (that discarded data
is zeroed), regardless of the backing device type of the loop device.
There should be no behavior change for loop devices backed by regular
files.This change fixes blktest block/003, and removes an extraneous
error print in block/013 when testing on a loop device backed
by a block device that does not support discard.Signed-off-by: Evan Green
Reviewed-by: Gwendal Grignou
Reviewed-by: Chaitanya Kulkarni
[used updated version of Evan's comment in loop_config_discard()]
[moved backingq to local scope, removed redundant braces]
Signed-off-by: Andrzej Pietrasiewicz
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
23 Apr, 2020
2 commits
-
[ Upstream commit 952c48b0ed18919bff7528501e9a3fff8a24f8cd ]
rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(),
including rbd_dev_header_info(), which means that rbd_dev_header_info()
isn't supposed to be called after rbd_dev_unprobe().However, rbd_dev_image_release() calls rbd_dev_unprobe() before
rbd_unregister_watch(). This is racy because a header update notify
can sneak in:"rbd unmap" thread ceph-watch-notify worker
rbd_dev_image_release()
rbd_dev_unprobe()
free and zero out header
rbd_watch_cb()
rbd_dev_refresh()
rbd_dev_header_info()
read in headerThe same goes for "rbd map" because rbd_dev_image_probe() calls
rbd_dev_unprobe() on errors. In both cases this results in a memory
leak.Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()")
Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman
Signed-off-by: Sasha Levin -
[ Upstream commit 0e4e1de5b63fa423b13593337a27fd2d2b0bcf77 ]
rbd_unregister_watch() flushes notifies and therefore cannot be called
under header_rwsem because a header update notify takes header_rwsem to
synchronize with "rbd map". If mapping an image fails after the watch
is established and a header update notify sneaks in, we deadlock when
erroring out from rbd_dev_image_probe().Move watch registration and unregistration out of the critical section.
The only reason they were put there was to make header_rwsem management
slightly more obvious.Fixes: 811c66887746 ("rbd: fix rbd map vs notify races")
Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman
Signed-off-by: Sasha Levin
17 Apr, 2020
4 commits
-
commit 3a169c0be75b59dd85d159493634870cdec6d3c4 upstream.
Commit 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for
large array allocation") didn't fix the issue it was meant to, as the
flags for allocating the memory are GFP_NOIO, which will lead the
memory allocation falling back to kmalloc().So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation
in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section.Fixes: 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Acked-by: Roger Pau Monné
Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit ff77042296d0a54535ddf74412c5ae92cb4ec76a ]
Steps to reproduce:
BLKRESETZONE zone 0
// force EIO
pwrite(fd, buf, 4096, 4096);[issue more IO including zone ioctls]
It will start failing randomly including IO to unrelated zones because of
->error "reuse". Trigger can be partition detection as well if test is not
run immediately which is even more entertaining.The fix is of course to clear ->error where necessary.
Reviewed-by: Christoph Hellwig
Signed-off-by: Alexey Dobriyan (SK hynix)
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit 9b03b713082a31a5b90e0a893c72aa620e255c26 ]
If null_add_dev() fails then null_del_dev() is called with a NULL argument.
Make null_del_dev() handle this scenario correctly. This patch fixes the
following KASAN complaint:null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
Read of size 8 at addr 0000000000000000 by task find/1062Call Trace:
dump_stack+0xa5/0xe6
__kasan_report.cold+0x65/0x99
kasan_report+0x16/0x20
__asan_load8+0x58/0x90
null_del_dev+0x28/0x280 [null_blk]
nullb_group_drop_item+0x7e/0xa0 [null_blk]
client_drop_item+0x53/0x80 [configfs]
configfs_rmdir+0x395/0x4e0 [configfs]
vfs_rmdir+0xb6/0x220
do_rmdir+0x238/0x2c0
__x64_sys_unlinkat+0x75/0x90
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbeSigned-off-by: Bart Van Assche
Reviewed-by: Chaitanya Kulkarni
Cc: Johannes Thumshirn
Cc: Hannes Reinecke
Cc: Ming Lei
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit 2004bfdef945fe55196db6b9cdf321fbc75bb0de ]
If null_add_dev() fails, clear dev->nullb.
This patch fixes the following KASAN complaint:
BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
Read of size 8 at addr ffff88803280fc30 by task check/8409Call Trace:
dump_stack+0xa5/0xe6
print_address_description.constprop.0+0x26/0x260
__kasan_report.cold+0x7b/0x99
kasan_report+0x16/0x20
__asan_load8+0x58/0x90
nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff370926317
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0Allocated by task 8409:
save_stack+0x23/0x90
__kasan_kmalloc.constprop.0+0xcf/0xe0
kasan_kmalloc+0xd/0x10
kmem_cache_alloc_node_trace+0x129/0x4c0
null_add_dev+0x24a/0xe90 [null_blk]
nullb_device_power_store+0x1b6/0x270 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbeFreed by task 8409:
save_stack+0x23/0x90
__kasan_slab_free+0x112/0x160
kasan_slab_free+0x12/0x20
kfree+0xdf/0x250
null_add_dev+0xaf3/0xe90 [null_blk]
nullb_device_power_store+0x1b6/0x270 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbeFixes: 2984c8684f96 ("nullb: factor disk parameters")
Signed-off-by: Bart Van Assche
Reviewed-by: Chaitanya Kulkarni
Cc: Johannes Thumshirn
Cc: Hannes Reinecke
Cc: Ming Lei
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
18 Mar, 2020
1 commit
-
commit f5f6b95c72f7f8bb46eace8c5306c752d0133daa upstream.
Since nobody else is going to restart our hw_queue for us, the
blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
necessarily sufficient to ensure that the queue will get started again.
In case of global resource outage (-ENOMEM because mapping failure,
because of swiotlb full) our virtqueue may be empty and we can get
stuck with a stopped hw_queue.Let us not stop the queue on arbitrary errors, but only on -EONSPC which
indicates a full virtqueue, where the hw_queue is guaranteed to get
started by virtblk_done() before when it makes sense to carry on
submitting requests. Let us also remove a stale comment.Signed-off-by: Halil Pasic
Cc: Jens Axboe
Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Greg Kroah-Hartman
29 Feb, 2020
1 commit
-
commit 2e90ca68b0d2f5548804f22f0dd61145516171e3 upstream.
Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
wait_til_ready().Which on the face of it can't happen, since as Willy Tarreau points out,
the function does no particular memory access. Except through the FDCS
macro, which just indexes a static allocation through teh current fdc,
which is always checked against N_FDC.Except the checking happens after we've already assigned the value.
The floppy driver is a disgrace (a lot of it going back to my original
horrd "design"), and has no real maintainer. Nobody has the hardware,
and nobody really cares. But it still gets used in virtual environment
because it's one of those things that everybody supports.The whole thing should be re-written, or at least parts of it should be
seriously cleaned up. The 'current fdc' index, which is used by the
FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
prime example of how not to write code.But because nobody has the hardware or the motivation, let's just fix up
the immediate problem with a nasty band-aid: test the fdc index before
actually assigning it to the static 'fdc' variable.Reported-by: Jordy Zomer
Cc: Willy Tarreau
Cc: Dan Carpenter
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
24 Feb, 2020
4 commits
-
[ Upstream commit c8ab422553c81a0eb070329c63725df1cd1425bc ]
In brd_init func, rd_nr num of brd_device are firstly allocated
and add in brd_devices, then brd_devices are traversed to add each
brd_device by calling add_disk func. When allocating brd_device,
the disk->first_minor is set to i * max_part, if rd_nr * max_part
is larger than MINORMASK, two different brd_device may have the same
devt, then only one of them can be successfully added.
when rmmod brd.ko, it will cause oops when calling brd_exit.Follow those steps:
# modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
# rmmod brd
then, the oops will appear.Oops log:
[ 726.613722] Call trace:
[ 726.614175] kernfs_find_ns+0x24/0x130
[ 726.614852] kernfs_find_and_get_ns+0x44/0x68
[ 726.615749] sysfs_remove_group+0x38/0xb0
[ 726.616520] blk_trace_remove_sysfs+0x1c/0x28
[ 726.617320] blk_unregister_queue+0x98/0x100
[ 726.618105] del_gendisk+0x144/0x2b8
[ 726.618759] brd_exit+0x68/0x560 [brd]
[ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0
[ 726.620384] el0_svc_common+0x78/0x130
[ 726.621057] el0_svc_handler+0x38/0x78
[ 726.621738] el0_svc+0x8/0xc
[ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)Here, we add brd_check_and_reset_par func to check and limit max_part par.
--
V5->V6:
- remove useless codeV4->V5:(suggested by Ming Lei)
- make sure max_part is not larger than DISK_MAX_PARTSV3->V4:(suggested by Ming Lei)
- remove useless change
- add one limit of max_partV2->V3: (suggested by Ming Lei)
- clear .minors when running out of consecutive minor space in brd_alloc
- remove limit of rd_nrV1->V2:
- add more checks in brd_check_par_valid as suggested by Ming Lei.Signed-off-by: Zhiqiang Liu
Reviewed-by: Bob Liu
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit a55e601b2f02df5db7070e9a37bd655c9c576a52 ]
gcc -O3 warns about a dummy variable that is passed
down into rbd_img_fill_nodata without being initialized:drivers/block/rbd.c: In function 'rbd_img_fill_nodata':
drivers/block/rbd.c:2573:13: error: 'dummy' is used uninitialized in this function [-Werror=uninitialized]
fctx->iter = *fctx->pos;Since this is a dummy, I assume the warning is harmless, but
it's better to initialize it anyway and avoid the warning.Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
Signed-off-by: Arnd Bergmann
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin -
[ Upstream commit 3b82a051c10143639a378dcd12019f2353cc9054 ]
Currently when an error code -EIO or -ENOSPC in the for-loop of
writeback_store the error code is being overwritten by a ret = len
assignment at the end of the function and the error codes are being
lost. Fix this by assigning ret = len at the start of the function and
remove the assignment from the end, hence allowing ret to be preserved
when error codes are assigned to it.Addresses Coverity ("Unused value")
Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
Fixes: a939888ec38b ("zram: support idle/huge page writeback")
Signed-off-by: Colin Ian King
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin -
[ Upstream commit 5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80 ]
When kzalloc fail, may cause trying to destroy the
workqueue from inside the workqueue.If num_connections is m (2 < m), and NO.1 ~ NO.n
(1 < n < m) kzalloc are successful. The NO.(n + 1)
failed. Then, nbd_start_device will return ENOMEM
to nbd_start_device_ioctl, and nbd_start_device_ioctl
will return immediately without running flush_workqueue.
However, we still have n recv threads. If nbd_release
run first, recv threads may have to drop the last
config_refs and try to destroy the workqueue from
inside the workqueue.To fix it, add a flush_workqueue in nbd_start_device.
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Sun Ke
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
23 Jan, 2020
1 commit
-
commit 589b72894f53124a39d1bb3c0cecaf9dcabac417 upstream.
Clang warns:
../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
nr_parts = PARTS_PER_DISK;
^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
if (err)
^This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.While we are here, the previous line has some trailing whitespace; clean
that up as well.Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor
Reviewed-by: Juergen Gross
Acked-by: Roger Pau Monné
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman
09 Jan, 2020
2 commits
-
[ Upstream commit f9bd84a8a845d82f9b5a081a7ae68c98a11d2e84 ]
For each I/O request, blkback first maps the foreign pages for the
request to its local pages. If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped. Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure. On a
system that fails the allocation frequently, this problem leads to
following kernel crash.[ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[ 372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40
[ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
[ 372.012562] Oops: 0002 [#1] SMP
[ 372.012566] Modules linked in: act_police sch_ingress cls_u32
...
[ 372.012746] Call Trace:
[ 372.012752] [] gnttab_unmap_refs+0x34/0x40
[ 372.012759] [] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
...
[ 372.012802] [] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
...
Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpusetThis commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")
Reviewed-by: David Woodhouse
Reviewed-by: Maximilian Heyne
Reviewed-by: Paul Durrant
Reviewed-by: Roger Pau Monné
Signed-off-by: SeongJae Park
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit fa2ac657f9783f0891b2935490afe9a7fd29d3fa ]
Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
cache. This cache is destoyed when xen-blkif is unloaded so it is
necessary to wait for the deferred free routine used for such objects to
complete. This necessity was missed in commit 14855954f636 "xen-blkback:
allow module to be cleanly unloaded". This patch fixes the problem by
taking/releasing extra module references in xen_blkif_alloc/free()
respectively.Signed-off-by: Paul Durrant
Reviewed-by: Roger Pau Monné
Signed-off-by: Juergen Gross
Signed-off-by: Sasha Levin
31 Dec, 2019
2 commits
-
commit 1c05839aa973cfae8c3db964a21f9c0eef8fcc21 upstream.
This fixes a regression added with:
commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie
Date: Sun Aug 4 14:10:06 2019 -0500nbd: fix max number of supported devs
where we can deadlock during device shutdown. The problem occurs if
the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
returned and the userspace app has droppped its reference via closing
the device and running nbd_release. The recv_work nbd_config_put call
would then drop the refcount to zero and try to destroy the config which
would try to do destroy_workqueue from the recv work.This patch just has nbd_start_device_ioctl do a flush_workqueue when it
wakes so we know after the ioctl returns running works have exited. This
also fixes a possible race where we could try to reuse the device while
old recv_works are still running.Cc: stable@vger.kernel.org
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit efcfec579f6139528c9e6925eca2bc4a36da65c6 ]
Currently, if the loop device receives a WRITE_ZEROES request, it asks
the underlying filesystem to punch out the range. This behavior is
correct if unmapping is allowed. However, a NOUNMAP request means that
the caller doesn't want us to free the storage backing the range, so
punching out the range is incorrect behavior.To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
the fallocate documentation) required to ensure that the entire range is
backed by real storage, which suffices for our purposes.Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
29 Nov, 2019
1 commit
-
commit 03bf73c315edca28f47451913177e14cd040a216 upstream.
In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
reallocted memory is leak. The correct behaviour should be assigning the
reallocted memory to config->socks right after success.Reviewed-by: Josef Bacik
Signed-off-by: Navid Emamdoost
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
22 Nov, 2019
1 commit
-
Pull block fix from Jens Axboe:
"Just a single fix for an issue in nbd introduced in this cycle"* tag 'for-linus-20191121' of git://git.kernel.dk/linux-block:
nbd:fix memory leak in nbd_get_socket()
20 Nov, 2019
1 commit
-
Before returning NULL, put the sock first.
Cc: stable@vger.kernel.org
Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
Reviewed-by: Josef Bacik
Reviewed-by: Mike Christie
Signed-off-by: Sun Ke
Signed-off-by: Jens Axboe
16 Nov, 2019
1 commit
-
Pull block fixes from Jens Axboe:
"A few fixes that should make it into this release. This contains:- io_uring:
- The timeout command assumes sequence == 0 means that we want
one completion, but this kind of overloading is unfortunate as
it prevents users from doing a pure time based wait. Since
this operation was introduced in this cycle, let's correct it
now, while we can. (me)
- One-liner to fix an issue with dependent links and fixed
buffer reads. The actual IO completed fine, but the link got
severed since we stored the wrong expected value. (me)
- Add TIMEOUT to list of opcodes that don't need a file. (Pavel)- rsxx missing workqueue destry calls. Old bug. (Chuhong)
- Fix blk-iocost active list check (Jiufei)
- Fix impossible-to-hit overflow merge condition, that still hit some
folks very rarely (Junichi)- Fix bfq hang issue from 5.3. This didn't get marked for stable, but
will go into stable post this merge (Paolo)"* tag 'for-linus-20191115' of git://git.kernel.dk/linux-block:
rsxx: add missed destroy_workqueue calls in remove
iocost: check active_list of all the ancestors in iocg_activate()
block, bfq: deschedule empty bfq_queues not referred by any process
io_uring: ensure registered buffer import returns the IO length
io_uring: Fix getting file for timeout
block: check bi_size overflow before merge
io_uring: make timeout sequence == 0 mean no sequence
15 Nov, 2019
2 commits
-
The driver misses calling destroy_workqueue in remove like what is done
when probe fails.
Add the missed calls to fix it.Signed-off-by: Chuhong Yuan
Signed-off-by: Jens Axboe -
Some versions of gcc (so far 6.3 and 7.4) throw a warning:
drivers/block/rbd.c: In function 'rbd_object_map_callback':
drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized]
(current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN))
drivers/block/rbd.c:2092:23: note: 'current_state' was declared here
u8 state, new_state, current_state;
^~~~~~~~~~~~~It's bogus because all current_state accesses are guarded by
has_current_state.Reported-by: kbuild test robot
Signed-off-by: Ilya Dryomov
Reviewed-by: Dongsheng Yang
08 Nov, 2019
1 commit
-
There are two callers of this function and they both unlock the mutex so
this ends up being a double unlock.Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Signed-off-by: Dan Carpenter
Signed-off-by: Jens Axboe