12 Feb, 2019

1 commit

  • Fix a typo in pkt_start_recovery.

    Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jiufei Xue
    Signed-off-by: Jens Axboe
    (cherry picked from commit 158e61865a31ef7abf39629c37285810504d60b5)

    Jiufei Xue
     

23 Jan, 2019

7 commits

  • commit c8a83a6b54d0ca078de036aafb3f6af58c1dc5eb upstream.

    NBD can update block device block size implicitely through
    bd_set_size(). Make it explicitely set blocksize with set_blocksize() as
    this behavior of bd_set_size() is going away.

    CC: Josef Bacik
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 5db470e229e22b7eda6e23b5566e532c96fb5bc3 upstream.

    If we don't drop caches used in old offset or block_size, we can get old data
    from new offset/block_size, which gives unexpected data to user.

    For example, Martijn found a loopback bug in the below scenario.
    1) LOOP_SET_FD loads first two pages on loop file
    2) LOOP_SET_STATUS64 changes the offset on the loop file
    3) mount is failed due to the cached pages having wrong superblock

    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Reported-by: Martijn Coenen
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jaegeuk Kim
     
  • commit 628bd85947091830a8c4872adfd5ed1d515a9cf2 upstream.

    Commit 0a42e99b58a20883 ("loop: Get rid of loop_index_mutex") forgot to
    remove mutex_unlock(&loop_ctl_mutex) from loop_control_ioctl() when
    replacing loop_index_mutex with loop_ctl_mutex.

    Fixes: 0a42e99b58a20883 ("loop: Get rid of loop_index_mutex")
    Reported-by: syzbot
    Reviewed-by: Ming Lei
    Reviewed-by: Jan Kara
    Signed-off-by: Tetsuo Handa
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     
  • commit 0a42e99b58a208839626465af194cfe640ef9493 upstream.

    Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as
    there is no good reason to keep these two separate and it just
    complicates the locking.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 967d1dc144b50ad005e5eecdfadfbcfb399ffff6 upstream.

    __loop_release() has a single call site. Fold it there. This is
    currently not a huge win but it will make following replacement of
    loop_index_mutex more obvious.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 310ca162d779efee8a2dc3731439680f3e9c1e86 upstream.

    syzbot is reporting NULL pointer dereference [1] which is caused by
    race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
    ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
    loop devices at loop_validate_file() without holding corresponding
    lo->lo_ctl_mutex locks.

    Since ioctl() request on loop devices is not frequent operation, we don't
    need fine grained locking. Let's use global lock in order to allow safe
    traversal at loop_validate_file().

    Note that syzbot is also reporting circular locking dependency between
    bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
    blkdev_reread_part() with lock held. This patch does not address it.

    [1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3
    [2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

    Signed-off-by: Tetsuo Handa
    Reported-by: syzbot
    Reviewed-by: Jan Kara
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     
  • commit b1ab5fa309e6c49e4e06270ec67dd7b3e9971d04 upstream.

    vfs_getattr() needs "struct path" rather than "struct file".
    Let's use path_get()/path_put() rather than get_file()/fput().

    Signed-off-by: Tetsuo Handa
    Reviewed-by: Jan Kara
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     

17 Jan, 2019

1 commit

  • commit 85f5a4d666fd9be73856ed16bb36c5af5b406b29 upstream.

    There is a window between when RBD_DEV_FLAG_REMOVING is set and when
    the device is removed from rbd_dev_list. During this window, we set
    "already" and return 0.

    Returning 0 from write(2) can confuse userspace tools because
    0 indicates that nothing was written. In particular, "rbd unmap"
    will retry the write multiple times a second:

    10:28:05.463299 write(4, "0", 1) = 0
    10:28:05.463509 write(4, "0", 1) = 0
    10:28:05.463720 write(4, "0", 1) = 0
    10:28:05.463942 write(4, "0", 1) = 0
    10:28:05.464155 write(4, "0", 1) = 0

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Tested-by: Dongsheng Yang
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     

13 Jan, 2019

1 commit

  • commit 5547932dc67a48713eece4fa4703bfdf0cfcb818 upstream.

    If blkdev_get fails, we shouldn't do blkdev_put. Otherwise, kernel emits
    below log. This patch fixes it.

    WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120
    Modules linked in:
    CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    RIP: 0010:blkdev_put+0x105/0x120
    Call Trace:
    __x64_sys_swapoff+0x46d/0x490
    do_syscall_64+0x5a/0x190
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    irq event stamp: 4466
    hardirqs last enabled at (4465): __free_pages_ok+0x1e3/0x490
    hardirqs last disabled at (4466): trace_hardirqs_off_thunk+0x1a/0x1c
    softirqs last enabled at (3420): __do_softirq+0x333/0x446
    softirqs last disabled at (3407): irq_exit+0xd1/0xe0

    Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Reviewed-by: Joey Pabalinas
    Cc: [4.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Minchan Kim
     

01 Dec, 2018

1 commit

  • [ Upstream commit de7b75d82f70c5469675b99ad632983c50b6f7e7 ]

    LKP recently reported a hang at bootup in the floppy code:

    [ 245.678853] INFO: task mount:580 blocked for more than 120 seconds.
    [ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1
    [ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 245.682181] mount D 6372 580 1 0x00000004
    [ 245.683023] Call Trace:
    [ 245.683425] __schedule+0x2df/0x570
    [ 245.683975] schedule+0x2d/0x80
    [ 245.684476] schedule_timeout+0x19d/0x330
    [ 245.685090] ? wait_for_common+0xa5/0x170
    [ 245.685735] wait_for_common+0xac/0x170
    [ 245.686339] ? do_sched_yield+0x90/0x90
    [ 245.686935] wait_for_completion+0x12/0x20
    [ 245.687571] __floppy_read_block_0+0xfb/0x150
    [ 245.688244] ? floppy_resume+0x40/0x40
    [ 245.688844] floppy_revalidate+0x20f/0x240
    [ 245.689486] check_disk_change+0x43/0x60
    [ 245.690087] floppy_open+0x1ea/0x360
    [ 245.690653] __blkdev_get+0xb4/0x4d0
    [ 245.691212] ? blkdev_get+0x1db/0x370
    [ 245.691777] blkdev_get+0x1f3/0x370
    [ 245.692351] ? path_put+0x15/0x20
    [ 245.692871] ? lookup_bdev+0x4b/0x90
    [ 245.693539] blkdev_get_by_path+0x3d/0x80
    [ 245.694165] mount_bdev+0x2a/0x190
    [ 245.694695] squashfs_mount+0x10/0x20
    [ 245.695271] ? squashfs_alloc_inode+0x30/0x30
    [ 245.695960] mount_fs+0xf/0x90
    [ 245.696451] vfs_kern_mount+0x43/0x130
    [ 245.697036] do_mount+0x187/0xc40
    [ 245.697563] ? memdup_user+0x28/0x50
    [ 245.698124] ksys_mount+0x60/0xc0
    [ 245.698639] sys_mount+0x19/0x20
    [ 245.699167] do_int80_syscall_32+0x61/0x130
    [ 245.699813] entry_INT80_32+0xc7/0xc7

    showing that we never complete that read request. The reason is that
    the completion setup is racy - it initializes the completion event
    AFTER submitting the IO, which means that the IO could complete
    before/during the init. If it does, we are passing garbage to
    complete() and we may sleep forever waiting for the event to
    occur.

    Fixes: 7b7b68bba5ef ("floppy: bail out in open() if drive is not responding to block0 read")
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Jens Axboe
     

27 Nov, 2018

1 commit

  • commit fef912bf860e upstream.
    commit 98af4d4df889 upstream.

    I got a report from Howard Chen that he saw zram and sysfs race(ie,
    zram block device file is created but sysfs for it isn't yet)
    when he tried to create new zram devices via hotadd knob.

    v4.20 kernel fixes it by [1, 2] but it's too large size to merge
    into -stable so this patch fixes the problem by registering defualt
    group by Greg KH's approach[3].

    This patch should be applied to every stable tree [3.16+] currently
    existing from kernel.org because the problem was introduced at 2.6.37
    by [4].

    [1] fef912bf860e, block: genhd: add 'groups' argument to device_add_disk
    [2] 98af4d4df889, zram: register default groups with device_add_disk()
    [3] http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/
    [4] 33863c21e69e9, Staging: zram: Replace ioctls with sysfs interface

    Cc: Sergey Senozhatsky
    Cc: Hannes Reinecke
    Tested-by: Howard Chen
    Signed-off-by: Minchan Kim
    Signed-off-by: Sasha Levin

    Minchan Kim
     

14 Nov, 2018

4 commits

  • commit 6cc4a0863c9709c512280c64e698d68443ac8053 upstream.

    info->nr_rings isn't adjusted in case of ENOMEM error from
    negotiate_mq(). This leads to kernel panic in error path.

    Typical call stack involving panic -
    #8 page_fault at ffffffff8175936f
    [exception RIP: blkif_free_ring+33]
    RIP: ffffffffa0149491 RSP: ffff8804f7673c08 RFLAGS: 00010292
    ...
    #9 blkif_free at ffffffffa0149aaa [xen_blkfront]
    #10 talk_to_blkback at ffffffffa014c8cd [xen_blkfront]
    #11 blkback_changed at ffffffffa014ea8b [xen_blkfront]
    #12 xenbus_otherend_changed at ffffffff81424670
    #13 backend_changed at ffffffff81426dc3
    #14 xenwatch_thread at ffffffff81422f29
    #15 kthread at ffffffff810abe6a
    #16 ret_from_fork at ffffffff81754078

    Cc: stable@vger.kernel.org
    Fixes: 7ed8ce1c5fc7 ("xen-blkfront: move negotiate_mq to cover all cases of new VBDs")
    Signed-off-by: Manjunath Patil
    Acked-by: Roger Pau Monné
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Manjunath Patil
     
  • commit f92898e7f32e3533bfd95be174044bc349d416ca upstream.

    If a block device is hot-added when we are out of grants,
    gnttab_grant_foreign_access fails with -ENOSPC (log message "28
    granting access to ring page") in this code path:

    talk_to_blkback ->
    setup_blkring ->
    xenbus_grant_ring ->
    gnttab_grant_foreign_access

    and the failing path in talk_to_blkback sets the driver_data to NULL:

    destroy_blkring:
    blkif_free(info, 0);

    mutex_lock(&blkfront_mutex);
    free_info(info);
    mutex_unlock(&blkfront_mutex);

    dev_set_drvdata(&dev->dev, NULL);

    This results in a NULL pointer BUG when blkfront_remove and blkif_free
    try to access the failing device's NULL struct blkfront_info.

    Cc: stable@vger.kernel.org # 4.5 and later
    Signed-off-by: Vasilis Liaskovitis
    Reviewed-by: Roger Pau Monné
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Vasilis Liaskovitis
     
  • [ Upstream commit 1448a2a5360ae06f25e2edc61ae070dff5c0beb4 ]

    If we fail to allocate the request queue for a disk, we still need to
    free that disk, not just the previous ones. Additionally, we need to
    cleanup the previous request queues.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Omar Sandoval
     
  • [ Upstream commit 71327f547ee3a46ec5c39fdbbd268401b2578d0e ]

    Move queue allocation next to disk allocation to fix a couple of issues:

    - If add_disk() hasn't been called, we should clear disk->queue before
    calling put_disk().
    - If we fail to allocate a request queue, we still need to put all of
    the disks, not just the ones that we allocated queues for.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Omar Sandoval
     

04 Oct, 2018

1 commit

  • commit 65eea8edc315589d6c993cf12dbb5d0e9ef1fe4e upstream.

    The final field of a floppy_struct is the field "name", which is a pointer
    to a string in kernel memory. The kernel pointer should not be copied to
    user memory. The FDGETPRM ioctl copies a floppy_struct to user memory,
    including this "name" field. This pointer cannot be used by the user
    and it will leak a kernel address to user-space, which will reveal the
    location of kernel code and data and undermine KASLR protection.

    Model this code after the compat ioctl which copies the returned data
    to a previously cleared temporary structure on the stack (excluding the
    name pointer) and copy out to userspace from there. As we already have
    an inparam union with an appropriate member and that memory is already
    cleared even for read only calls make use of that as a temporary store.

    Based on an initial patch by Brian Belleville.

    CVE-2018-7755
    Signed-off-by: Andy Whitcroft
    Broke up long line.
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Andy Whitcroft
     

20 Sep, 2018

2 commits

  • [ Upstream commit 55690c07b44a82cc3359ce0c233f4ba7d80ba145 ]

    User controls @dev_minor which to be used as index of pkt_devs.
    So, It can be exploited via Spectre-like attack. (speculative execution)

    This kind of attack leaks address of pkt_devs, [1]
    It leads an attacker to bypass security mechanism such as KASLR.

    So sanitize @dev_minor before using it to prevent attack.

    [1] https://github.com/jinb-park/linux-exploit/
    tree/master/exploit-remaining-spectre-gadget/leak_pkt_devs.c

    Signed-off-by: Jinbum Park
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jinbum Park
     
  • commit bc811f05d77f47059c197a98b6ad242eb03999cb upstream.

    syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl.
    We need proper validation of the input here. Not just if it's
    zero, but also if the value is a power-of-2 and in a valid
    range. Add that.

    Cc: stable@vger.kernel.org
    Reported-by: syzbot
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     

10 Sep, 2018

1 commit

  • commit c8bd134a4bddafe5917d163eea73873932c15e83 upstream.

    The call to strlcpy in backing_dev_store is incorrect. It should take
    the size of the destination buffer instead of the size of the source
    buffer. Additionally, ignore the newline character (\n) when reading
    the new file_name buffer. This makes it possible to set the backing_dev
    as follows:

    echo /dev/sdX > /sys/block/zram0/backing_dev

    The reason it worked before was the fact that strlcpy() copies 'len - 1'
    bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
    copy the trailing new line symbol. Which also means that "echo -n
    /dev/sdX" most likely was broken.

    Signed-off-by: Peter Kalauskas
    Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.com
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: [4.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Peter Kalauskas
     

05 Sep, 2018

2 commits

  • [ Upstream commit 8f3ea35929a0806ad1397db99a89ffee0140822a ]

    If the server or network is misbehaving and we get an unexpected reply
    we can sometimes miss the request not being started and wait on a
    request and never get a response, or even double complete the same
    request. Fix this by replacing the send_complete completion with just a
    per command lock. Add a per command cookie as well so that we can know
    if we're getting a double completion for a previous event. Also check
    to make sure we dont have REQUEUED set as that means we raced with the
    timeout handler and need to just let the retry occur.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • [ Upstream commit d7d94d48a272fd7583dc3c83acb8f5ed4ef456a4 ]

    We can race with the snd timeout and the per-request timeout and end up
    requeuing the same request twice. We can't use the send_complete
    completion to tell if everything is ok because we hold the tx_lock
    during send, so the timeout stuff will block waiting to mark the socket
    dead, and we could be marked complete and still requeue. Instead add a
    flag to the socket so we know whether we've been requeued yet.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     

24 Aug, 2018

2 commits

  • [ Upstream commit fad2d4ef636654e926d374ef038f4cd4286661f6 ]

    Fix the test that verifies whether bio_op(bio) represents a discard
    or write zeroes operation. Compile-tested only.

    Cc: Philipp Reisner
    Cc: Lars Ellenberg
    Fixes: 7435e9018f91 ("drbd: zero-out partial unaligned discards on local backend")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • [ Upstream commit 08ba91ee6e2c1c08d3f0648f978cbb5dbf3491d8 ]

    If NBD_DISCONNECT_ON_CLOSE is set on a device, then the driver will
    issue a disconnect from nbd_release if the device has no remaining
    bdev->bd_openers.

    Fix ret val so reconfigure with only setting the flag succeeds.

    Reviewed-by: Josef Bacik
    Signed-off-by: Doron Roberts-Kedes
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Doron Roberts-Kedes
     

17 Jul, 2018

2 commits

  • commit d3349b6b3c373ac1fbfb040b810fcee5e2adc7e0 upstream.

    syzbot is hitting WARN() triggered by memory allocation fault
    injection [1] because loop module is calling sysfs_remove_group()
    when sysfs_create_group() failed.
    Fix this by remembering whether sysfs_create_group() succeeded.

    [1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b

    Signed-off-by: Tetsuo Handa
    Reported-by: syzbot
    Reviewed-by: Greg Kroah-Hartman

    Renamed sysfs_ready -> sysfs_inited.

    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     
  • commit d2ac838e4cd7e5e9891ecc094d626734b0245c99 upstream.

    Refactor the validation code used in LOOP_SET_FD so it is also used in
    LOOP_CHANGE_FD. Otherwise it is possible to construct a set of loop
    devices that all refer to each other. This can lead to a infinite
    loop in starting with "while (is_loop_device(f)) .." in loop_set_fd().

    Fix this by refactoring out the validation code and using it for
    LOOP_CHANGE_FD as well as LOOP_SET_FD.

    Reported-by: syzbot+4349872271ece473a7c91190b68b4bac7c5dbc87@syzkaller.appspotmail.com
    Reported-by: syzbot+40bd32c4d9a3cc12a339@syzkaller.appspotmail.com
    Reported-by: syzbot+769c54e66f994b041be7@syzkaller.appspotmail.com
    Reported-by: syzbot+0a89a9ce473936c57065@syzkaller.appspotmail.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

11 Jul, 2018

1 commit

  • commit 64dafbc9530c10300acffc57fae3269d95fa8f93 upstream.

    We have
    struct drbd_requests { ... struct bio *private_bio; ... }
    to hold a bio clone for local submission.

    On local IO completion, we put that bio, and in case we want to use the
    result later, we overload that member to hold the ERR_PTR() of the
    completion result,

    Which, before v4.3, used to be the passed in "int error",
    so we could first bio_put(), then assign.

    v4.3-rc1~100^2~21 4246a0b63bd8 block: add a bi_error field to struct bio
    changed that:
    bio_put(req->private_bio);
    - req->private_bio = ERR_PTR(error);
    + req->private_bio = ERR_PTR(bio->bi_error);

    Which introduces an access after free,
    because it was non obvious that req->private_bio == bio.

    Impact of that was mostly unnoticable, because we only use that value
    in a multiple-failure case, and even then map any "unexpected" error
    code to EIO, so worst case we could potentially mask a more specific
    error with EIO in a multiple failure case.

    Unless the pointed to memory region was unmapped, as is the case with
    CONFIG_DEBUG_PAGEALLOC, in which case this results in

    BUG: unable to handle kernel paging request

    v4.13-rc1~70^2~75 4e4cbee93d56 block: switch bios to blk_status_t
    changes it further to
    bio_put(req->private_bio);
    req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status));

    And blk_status_to_errno() now contains a WARN_ON_ONCE() for unexpected
    values, which catches this "sometimes", if the memory has been reused
    quickly enough for other things.

    Should also go into stable since 4.3, with the trivial change around 4.13.

    Cc: stable@vger.kernel.org
    Fixes: 4246a0b63bd8 block: add a bi_error field to struct bio
    Reported-by: Sarah Newman
    Signed-off-by: Lars Ellenberg
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Lars Ellenberg
     

03 Jul, 2018

1 commit

  • commit 23edca864951250af845a11da86bb3ea63522ed2 upstream.

    There is a problem if we are going to unmap a rbd device and the
    watch_dwork is going to queue delayed work for watch:

    unmap Thread watch Thread timer
    do_rbd_remove
    cancel_tasks_sync(rbd_dev)
    queue_delayed_work for watch
    destroy_workqueue(rbd_dev->task_wq)
    drain_workqueue(wq)
    destroy other resources in wq
    call_timer_fn
    __queue_work()

    Then the delayed work escape the cancel_tasks_sync() and
    destroy_workqueue() and we will get an user-after-free call trace:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP PTI
    Modules linked in:
    CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:__queue_work+0x6a/0x3b0
    RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
    RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
    RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
    R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
    R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
    FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
    Call Trace:

    ? __queue_work+0x3b0/0x3b0
    call_timer_fn+0x2d/0x130
    run_timer_softirq+0x16e/0x430
    ? tick_sched_timer+0x37/0x70
    __do_softirq+0xd2/0x280
    irq_exit+0xd5/0xe0
    smp_apic_timer_interrupt+0x6c/0x130
    apic_timer_interrupt+0xf/0x20

    [ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch()
    either bails out early because the watch is UNREGISTERED at that point
    or just gets cancelled. ]

    Cc: stable@vger.kernel.org
    Fixes: 99d1694310df ("rbd: retry watch re-registration periodically")
    Signed-off-by: Dongsheng Yang
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Dongsheng Yang
     

26 Jun, 2018

3 commits

  • commit 9e2b19675d1338d2a38e99194756f2db44a081df upstream.

    When we stopped relying on the bdev everywhere I broke updating the
    block device size on the fly, which ceph relies on. We can't just do
    set_capacity, we also have to do bd_set_size so things like parted will
    notice the device size change.

    Fixes: 29eaadc ("nbd: stop using the bdev everywhere")
    cc: stable@vger.kernel.org
    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit c3f7c9397609705ef848cc98a5fb429b3e90c3c4 upstream.

    I messed up changing the size of an NBD device while it was connected by
    not actually updating the device or doing the uevent. Fix this by
    updating everything if we're connected and we change the size.

    cc: stable@vger.kernel.org
    Fixes: 639812a ("nbd: don't set the device size until we're connected")
    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 8364da4751cf22201d74933d5e634176f44ed407 upstream.

    This fixes a use after free bug, we shouldn't be doing disk->queue right
    after we do del_gendisk(disk). Save the queue and do the cleanup after
    the del_gendisk.

    Fixes: c6a4759ea0c9 ("nbd: add device refcounting")
    cc: stable@vger.kernel.org
    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     

30 May, 2018

4 commits

  • [ Upstream commit 66231ad3e2886ba99fbf440cea44cab547e5163f ]

    On ARM64, the default page size has been 64K on some distributions, and
    we should allow ARM64 people to play null_blk.

    This patch fixes the issue by extend page bitmap size for supporting
    other non-4KB PAGE_SIZE.

    Cc: Bart Van Assche
    Cc: Shaohua Li
    Cc: Kyungchan Koh ,
    Cc: weiping zhang
    Cc: Yi Zhang
    Reported-by: Yi Zhang
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ming Lei
     
  • [ Upstream commit 2bbea6e117357d17842114c65e9a9cf2d13ae8a3 ]

    when mounting an ISO filesystem sometimes (very rarely)
    the system hangs because of a race condition between two tasks.

    PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount"
    #0 [ffff880078447ae0] __schedule at ffffffff8168d605
    #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
    #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
    #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
    #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
    #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
    #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
    #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
    #8 [ffff880078447da8] mount_bdev at ffffffff81202570
    #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
    #10 [ffff880078447e28] mount_fs at ffffffff81202d09
    #11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
    #12 [ffff880078447ea8] do_mount at ffffffff81220fee
    #13 [ffff880078447f28] sys_mount at ffffffff812218d6
    #14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246
    RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010
    RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010
    R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b

    This task was trying to mount the cdrom. It allocated and configured a
    super_block struct and owned the write-lock for the super_block->s_umount
    rwsem. While exclusively owning the s_umount lock, it called
    sr_block_ioctl and waited to acquire the global sr_mutex lock.

    PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd"
    #0 [ffff880078417898] __schedule at ffffffff8168d605
    #1 [ffff880078417900] schedule at ffffffff8168dc59
    #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
    #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
    #4 [ffff8800784179d0] down_read at ffffffff8168cde0
    #5 [ffff8800784179e8] get_super at ffffffff81201cc7
    #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
    #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
    #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
    #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
    #10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
    #11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
    #12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
    #13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
    #14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
    #15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
    #16 [ffff880078417d00] do_last at ffffffff8120d53d
    #17 [ffff880078417db0] path_openat at ffffffff8120e6b2
    #18 [ffff880078417e48] do_filp_open at ffffffff8121082b
    #19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
    #20 [ffff880078417f70] sys_open at ffffffff811fde4e
    #21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246
    RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000
    RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020
    R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e
    R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b

    This task tried to open the cdrom device, the sr_block_open function
    acquired the global sr_mutex lock. The call to check_disk_change()
    then saw an event flag indicating a possible media change and tried
    to flush any cached data for the device.
    As part of the flush, it tried to acquire the super_block->s_umount
    lock associated with the cdrom device.
    This was the same super_block as created and locked by the previous task.

    The first task acquires the s_umount lock and then the sr_mutex_lock;
    the second task acquires the sr_mutex_lock and then the s_umount lock.

    This patch fixes the issue by moving check_disk_change() out of
    cdrom_open() and let the caller take care of it.

    Signed-off-by: Maurizio Lombardi
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Maurizio Lombardi
     
  • [ Upstream commit 7ed8ce1c5fc7cf25b3602c73bef897a3466a6645 ]

    negotiate_mq should happen in all cases of a new VBD being discovered by
    xen-blkfront, whether called through _probe() or a hot-attached new VBD
    from dom-0 via xenstore. Otherwise, hot-attached new VBDs are left
    configured without multi-queue.

    Signed-off-by: Bhavesh Davda
    Reviewed-by: Konrad Rzeszutek Wilk
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Bhavesh Davda
     
  • [ Upstream commit 0979962f5490abe75b3e2befb07a564fa0cf631b ]

    It seems that the proper value to return in this particular case is the
    one contained into variable new_index instead of ret.

    Addresses-Coverity-ID: 1465148 ("Copy-paste error")
    Fixes: e46c7287b1c2 ("nbd: add a basic netlink interface")
    Reviewed-by: Omar Sandoval
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     

25 May, 2018

2 commits

  • commit bdac616db9bbadb90b7d6a406144571015e138f7 upstream.

    Commit 2d1d4c1e591f made loop_get_status() drop lo_ctx_mutex before
    returning, but the loop_get_status_old(), loop_get_status64(), and
    loop_get_status_compat() wrappers don't call loop_get_status() if the
    passed argument is NULL. The callers expect that the lock is dropped, so
    make sure we drop it in that case, too.

    Reported-by: syzbot+31e8daa8b3fc129e75f2@syzkaller.appspotmail.com
    Fixes: 2d1d4c1e591f ("loop: don't call into filesystem while holding lo_ctl_mutex")
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Omar Sandoval
     
  • commit 2d1d4c1e591fd40bd7dafd868a249d7d00e215d5 upstream.

    We hit an issue where a loop device on NFS was stuck in
    loop_get_status() doing vfs_getattr() after the NFS server died, which
    caused a pile-up of uninterruptible processes waiting on lo_ctl_mutex.
    There's no reason to hold this lock while we wait on the filesystem;
    let's drop it so that other processes can do their thing. We need to
    grab a reference on lo_backing_file while we use it, and we can get rid
    of the check on lo_device, which has been unnecessary since commit
    a34c0ae9ebd6 ("[PATCH] loop: remove the bio remapping capability") in
    the linux-history tree.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Omar Sandoval
     

29 Apr, 2018

3 commits

  • commit 5a13388d7aa1177b98d7168330ecbeeac52f844d upstream.

    Reading to the end of a 720K disk results in an IO error instead of EOF
    because the block layer thinks the disk has 2880 sectors. (Partly this
    is a result of inverted logic of the ONEMEG_MEDIA bit that's now fixed.)

    Initialize the density and head count in swim_add_floppy() to agree
    with the device size passed to set_capacity() during drive probe.

    Call set_capacity() again upon device open, after refreshing the density
    and head count values.

    Cc: Laurent Vivier
    Cc: Jens Axboe
    Cc: stable@vger.kernel.org # v4.14+
    Tested-by: Stan Johnson
    Signed-off-by: Finn Thain
    Acked-by: Laurent Vivier
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Finn Thain
     
  • commit 7ae6a2b6cc058005ee3d0d2b9ce27688e51afa4b upstream.

    In the floppy_find() function in swim.c is a call to
    get_disk(swd->unit[drive].disk). The actual parameter to this call
    can be a NULL pointer when drive == swd->floppy_count. This causes
    an oops in get_disk().

    Data read fault at 0x00000198 in Super Data (pc=0x1be5b6)
    BAD KERNEL BUSERR
    Oops: 00000000
    Modules linked in: swim_mod ipv6 mac8390
    PC: [] get_disk+0xc/0x76
    SR: 2004 SP: 9a078bc1 a2: 0213ed90
    d0: 00000000 d1: 00000000 d2: 00000000 d3: 000000ff
    d4: 00000002 d5: 02983590 a0: 02332e00 a1: 022dfd64
    Process dd (pid: 285, task=020ab25b)
    Frame format=B ssw=074d isc=4a88 isb=6732 daddr=00000198 dobuf=00000000
    baddr=001be5bc dibuf=bfffffff ver=f
    Stack from 022dfca4:
    00000000 0203fc00 0213ed90 022dfcc0 02982936 00000000 00200000 022dfd08
    0020f85a 00200000 022dfd64 02332e00 004040fc 00000014 001be77e 022dfd64
    00334e4a 001be3f8 0800001d 022dfd64 01c04b60 01c04b70 022aba80 029828f8
    02332e00 022dfd2c 001be7ac 0203fc00 00200000 022dfd64 02103a00 01c04b60
    01c04b60 0200e400 022dfd68 000e191a 00200000 022dfd64 02103a00 0800001d
    00000000 00000003 000b89de 00500000 02103a00 01c04b60 02103a08 01c04c2e
    Call Trace: [] floppy_find+0x3e/0x4a [swim_mod]
    [] uart_remove_one_port+0x1a2/0x260
    [] kobj_lookup+0xde/0x132
    [] uart_remove_one_port+0x1a2/0x260
    [] get_gendisk+0x0/0x130
    [] mutex_lock+0x0/0x2e
    [] disk_block_events+0x0/0x6c
    [] floppy_find+0x0/0x4a [swim_mod]
    [] get_gendisk+0x2e/0x130
    [] uart_remove_one_port+0x1a2/0x260
    [] __blkdev_get+0x32/0x45a
    [] uart_remove_one_port+0x1a2/0x260
    [] complete_walk+0x0/0x8a
    [] blkdev_get+0xe0/0x29a
    [] blkdev_open+0x0/0xb0
    [] complete_walk+0x0/0x8a
    [] blkdev_open+0x0/0xb0
    [] bd_acquire+0x74/0x8a
    [] blkdev_open+0x80/0xb0
    [] blkdev_open+0x0/0xb0
    [] do_dentry_open+0x1a4/0x322
    [] __do_proc_douintvec+0x22/0x27e
    [] complete_walk+0x0/0x8a
    [] link_path_walk+0x0/0x48e
    [] inode_permission+0x20/0x54
    [] vfs_open+0x42/0x78
    [] path_openat+0x2b2/0xeaa
    [] path_openat+0x0/0xeaa
    [] __irq_wake_thread+0x0/0x4e
    [] task_tick_fair+0x18/0xc8
    [] do_filp_open+0xa0/0xea
    [] do_sys_open+0x11a/0x1ee
    [] __do_proc_douintvec+0x22/0x27e
    [] SyS_open+0x1e/0x22
    [] __do_proc_douintvec+0x22/0x27e
    [] syscall+0x8/0xc
    [] __do_proc_douintvec+0x22/0x27e
    [] dyadic+0x1/0x28
    Code: 4e5e 4e75 4e56 fffc 2f0b 2f02 266e 0008 0198 4a88 6732 2428 002c 661e 486b 0058 4eb9 0032 0b96 588f 4a88 672c 2008
    Disabling lock debugging due to kernel taint

    Fix the array index bounds check to avoid this.

    Cc: Laurent Vivier
    Cc: Jens Axboe
    Cc: stable@vger.kernel.org # v4.14+
    Fixes: 8852ecd97488 ("[PATCH] m68k: mac - Add SWIM floppy support")
    Tested-by: Stan Johnson
    Signed-off-by: Finn Thain
    Acked-by: Laurent Vivier
    Reviewed-by: Geert Uytterhoeven
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Finn Thain
     
  • commit b3906535ccc6cd04c42f9b1c7e31d1947b3ebc74 upstream.

    The driver supports internal and external FDD units so the floppy_open
    function must not hard-code the drive location.

    Cc: Laurent Vivier
    Cc: Jens Axboe
    Cc: stable@vger.kernel.org # v4.14+
    Tested-by: Stan Johnson
    Signed-off-by: Finn Thain
    Acked-by: Laurent Vivier
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Finn Thain