29 Feb, 2020

1 commit

  • commit 2e90ca68b0d2f5548804f22f0dd61145516171e3 upstream.

    Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
    wait_til_ready().

    Which on the face of it can't happen, since as Willy Tarreau points out,
    the function does no particular memory access. Except through the FDCS
    macro, which just indexes a static allocation through teh current fdc,
    which is always checked against N_FDC.

    Except the checking happens after we've already assigned the value.

    The floppy driver is a disgrace (a lot of it going back to my original
    horrd "design"), and has no real maintainer. Nobody has the hardware,
    and nobody really cares. But it still gets used in virtual environment
    because it's one of those things that everybody supports.

    The whole thing should be re-written, or at least parts of it should be
    seriously cleaned up. The 'current fdc' index, which is used by the
    FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
    prime example of how not to write code.

    But because nobody has the hardware or the motivation, let's just fix up
    the immediate problem with a nasty band-aid: test the fdc index before
    actually assigning it to the static 'fdc' variable.

    Reported-by: Jordy Zomer
    Cc: Willy Tarreau
    Cc: Dan Carpenter
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

24 Feb, 2020

4 commits

  • [ Upstream commit c8ab422553c81a0eb070329c63725df1cd1425bc ]

    In brd_init func, rd_nr num of brd_device are firstly allocated
    and add in brd_devices, then brd_devices are traversed to add each
    brd_device by calling add_disk func. When allocating brd_device,
    the disk->first_minor is set to i * max_part, if rd_nr * max_part
    is larger than MINORMASK, two different brd_device may have the same
    devt, then only one of them can be successfully added.
    when rmmod brd.ko, it will cause oops when calling brd_exit.

    Follow those steps:
    # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
    # rmmod brd
    then, the oops will appear.

    Oops log:
    [ 726.613722] Call trace:
    [ 726.614175] kernfs_find_ns+0x24/0x130
    [ 726.614852] kernfs_find_and_get_ns+0x44/0x68
    [ 726.615749] sysfs_remove_group+0x38/0xb0
    [ 726.616520] blk_trace_remove_sysfs+0x1c/0x28
    [ 726.617320] blk_unregister_queue+0x98/0x100
    [ 726.618105] del_gendisk+0x144/0x2b8
    [ 726.618759] brd_exit+0x68/0x560 [brd]
    [ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0
    [ 726.620384] el0_svc_common+0x78/0x130
    [ 726.621057] el0_svc_handler+0x38/0x78
    [ 726.621738] el0_svc+0x8/0xc
    [ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)

    Here, we add brd_check_and_reset_par func to check and limit max_part par.

    --
    V5->V6:
    - remove useless code

    V4->V5:(suggested by Ming Lei)
    - make sure max_part is not larger than DISK_MAX_PARTS

    V3->V4:(suggested by Ming Lei)
    - remove useless change
    - add one limit of max_part

    V2->V3: (suggested by Ming Lei)
    - clear .minors when running out of consecutive minor space in brd_alloc
    - remove limit of rd_nr

    V1->V2:
    - add more checks in brd_check_par_valid as suggested by Ming Lei.

    Signed-off-by: Zhiqiang Liu
    Reviewed-by: Bob Liu
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Zhiqiang Liu
     
  • [ Upstream commit a55e601b2f02df5db7070e9a37bd655c9c576a52 ]

    gcc -O3 warns about a dummy variable that is passed
    down into rbd_img_fill_nodata without being initialized:

    drivers/block/rbd.c: In function 'rbd_img_fill_nodata':
    drivers/block/rbd.c:2573:13: error: 'dummy' is used uninitialized in this function [-Werror=uninitialized]
    fctx->iter = *fctx->pos;

    Since this is a dummy, I assume the warning is harmless, but
    it's better to initialize it anyway and avoid the warning.

    Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Sasha Levin

    Arnd Bergmann
     
  • [ Upstream commit 3b82a051c10143639a378dcd12019f2353cc9054 ]

    Currently when an error code -EIO or -ENOSPC in the for-loop of
    writeback_store the error code is being overwritten by a ret = len
    assignment at the end of the function and the error codes are being
    lost. Fix this by assigning ret = len at the start of the function and
    remove the assignment from the end, hence allowing ret to be preserved
    when error codes are assigned to it.

    Addresses Coverity ("Unused value")

    Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
    Fixes: a939888ec38b ("zram: support idle/huge page writeback")
    Signed-off-by: Colin Ian King
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Colin Ian King
     
  • [ Upstream commit 5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80 ]

    When kzalloc fail, may cause trying to destroy the
    workqueue from inside the workqueue.

    If num_connections is m (2 < m), and NO.1 ~ NO.n
    (1 < n < m) kzalloc are successful. The NO.(n + 1)
    failed. Then, nbd_start_device will return ENOMEM
    to nbd_start_device_ioctl, and nbd_start_device_ioctl
    will return immediately without running flush_workqueue.
    However, we still have n recv threads. If nbd_release
    run first, recv threads may have to drop the last
    config_refs and try to destroy the workqueue from
    inside the workqueue.

    To fix it, add a flush_workqueue in nbd_start_device.

    Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
    Signed-off-by: Sun Ke
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Sun Ke
     

23 Jan, 2020

1 commit

  • commit 589b72894f53124a39d1bb3c0cecaf9dcabac417 upstream.

    Clang warns:

    ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
    statement is not part of the previous 'if' [-Wmisleading-indentation]
    nr_parts = PARTS_PER_DISK;
    ^
    ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
    if (err)
    ^

    This is because there is a space at the beginning of this line; remove
    it so that the indentation is consistent according to the Linux kernel
    coding style and clang no longer warns.

    While we are here, the previous line has some trailing whitespace; clean
    that up as well.

    Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
    Link: https://github.com/ClangBuiltLinux/linux/issues/791
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Juergen Gross
    Acked-by: Roger Pau Monné
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Nathan Chancellor
     

09 Jan, 2020

2 commits

  • [ Upstream commit f9bd84a8a845d82f9b5a081a7ae68c98a11d2e84 ]

    For each I/O request, blkback first maps the foreign pages for the
    request to its local pages. If an allocation of a local page for the
    mapping fails, it should unmap every mapping already made for the
    request.

    However, blkback's handling mechanism for the allocation failure does
    not mark the remaining foreign pages as unmapped. Therefore, the unmap
    function merely tries to unmap every valid grant page for the request,
    including the pages not mapped due to the allocation failure. On a
    system that fails the allocation frequently, this problem leads to
    following kernel crash.

    [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
    [ 372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40
    [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
    [ 372.012562] Oops: 0002 [#1] SMP
    [ 372.012566] Modules linked in: act_police sch_ingress cls_u32
    ...
    [ 372.012746] Call Trace:
    [ 372.012752] [] gnttab_unmap_refs+0x34/0x40
    [ 372.012759] [] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
    ...
    [ 372.012802] [] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
    ...
    Decompressing Linux... Parsing ELF... done.
    Booting the kernel.
    [ 0.000000] Initializing cgroup subsys cpuset

    This commit fixes this problem by marking the grant pages of the given
    request that didn't mapped due to the allocation failure as invalid.

    Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

    Reviewed-by: David Woodhouse
    Reviewed-by: Maximilian Heyne
    Reviewed-by: Paul Durrant
    Reviewed-by: Roger Pau Monné
    Signed-off-by: SeongJae Park
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    SeongJae Park
     
  • [ Upstream commit fa2ac657f9783f0891b2935490afe9a7fd29d3fa ]

    Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
    cache. This cache is destoyed when xen-blkif is unloaded so it is
    necessary to wait for the deferred free routine used for such objects to
    complete. This necessity was missed in commit 14855954f636 "xen-blkback:
    allow module to be cleanly unloaded". This patch fixes the problem by
    taking/releasing extra module references in xen_blkif_alloc/free()
    respectively.

    Signed-off-by: Paul Durrant
    Reviewed-by: Roger Pau Monné
    Signed-off-by: Juergen Gross
    Signed-off-by: Sasha Levin

    Paul Durrant
     

31 Dec, 2019

2 commits

  • commit 1c05839aa973cfae8c3db964a21f9c0eef8fcc21 upstream.

    This fixes a regression added with:

    commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
    Author: Mike Christie
    Date: Sun Aug 4 14:10:06 2019 -0500

    nbd: fix max number of supported devs

    where we can deadlock during device shutdown. The problem occurs if
    the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
    returned and the userspace app has droppped its reference via closing
    the device and running nbd_release. The recv_work nbd_config_put call
    would then drop the refcount to zero and try to destroy the config which
    would try to do destroy_workqueue from the recv work.

    This patch just has nbd_start_device_ioctl do a flush_workqueue when it
    wakes so we know after the ioctl returns running works have exited. This
    also fixes a possible race where we could try to reuse the device while
    old recv_works are still running.

    Cc: stable@vger.kernel.org
    Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Mike Christie
     
  • [ Upstream commit efcfec579f6139528c9e6925eca2bc4a36da65c6 ]

    Currently, if the loop device receives a WRITE_ZEROES request, it asks
    the underlying filesystem to punch out the range. This behavior is
    correct if unmapping is allowed. However, a NOUNMAP request means that
    the caller doesn't want us to free the storage backing the range, so
    punching out the range is incorrect behavior.

    To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
    underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
    the fallocate documentation) required to ensure that the entire range is
    backed by real storage, which suffices for our purposes.

    Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Darrick J. Wong
     

29 Nov, 2019

1 commit

  • commit 03bf73c315edca28f47451913177e14cd040a216 upstream.

    In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
    reallocted memory is leak. The correct behaviour should be assigning the
    reallocted memory to config->socks right after success.

    Reviewed-by: Josef Bacik
    Signed-off-by: Navid Emamdoost
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Navid Emamdoost
     

22 Nov, 2019

1 commit


20 Nov, 2019

1 commit

  • Before returning NULL, put the sock first.

    Cc: stable@vger.kernel.org
    Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
    Reviewed-by: Josef Bacik
    Reviewed-by: Mike Christie
    Signed-off-by: Sun Ke
    Signed-off-by: Jens Axboe

    Sun Ke
     

16 Nov, 2019

1 commit

  • Pull block fixes from Jens Axboe:
    "A few fixes that should make it into this release. This contains:

    - io_uring:
    - The timeout command assumes sequence == 0 means that we want
    one completion, but this kind of overloading is unfortunate as
    it prevents users from doing a pure time based wait. Since
    this operation was introduced in this cycle, let's correct it
    now, while we can. (me)
    - One-liner to fix an issue with dependent links and fixed
    buffer reads. The actual IO completed fine, but the link got
    severed since we stored the wrong expected value. (me)
    - Add TIMEOUT to list of opcodes that don't need a file. (Pavel)

    - rsxx missing workqueue destry calls. Old bug. (Chuhong)

    - Fix blk-iocost active list check (Jiufei)

    - Fix impossible-to-hit overflow merge condition, that still hit some
    folks very rarely (Junichi)

    - Fix bfq hang issue from 5.3. This didn't get marked for stable, but
    will go into stable post this merge (Paolo)"

    * tag 'for-linus-20191115' of git://git.kernel.dk/linux-block:
    rsxx: add missed destroy_workqueue calls in remove
    iocost: check active_list of all the ancestors in iocg_activate()
    block, bfq: deschedule empty bfq_queues not referred by any process
    io_uring: ensure registered buffer import returns the IO length
    io_uring: Fix getting file for timeout
    block: check bi_size overflow before merge
    io_uring: make timeout sequence == 0 mean no sequence

    Linus Torvalds
     

15 Nov, 2019

2 commits

  • The driver misses calling destroy_workqueue in remove like what is done
    when probe fails.
    Add the missed calls to fix it.

    Signed-off-by: Chuhong Yuan
    Signed-off-by: Jens Axboe

    Chuhong Yuan
     
  • Some versions of gcc (so far 6.3 and 7.4) throw a warning:

    drivers/block/rbd.c: In function 'rbd_object_map_callback':
    drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized]
    (current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN))
    drivers/block/rbd.c:2092:23: note: 'current_state' was declared here
    u8 state, new_state, current_state;
    ^~~~~~~~~~~~~

    It's bogus because all current_state accesses are guarded by
    has_current_state.

    Reported-by: kbuild test robot
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Dongsheng Yang

    Ilya Dryomov
     

08 Nov, 2019

1 commit


26 Oct, 2019

3 commits

  • nbd requires socket families to support the shutdown method so the nbd
    recv workqueue can be woken up from its sock_recvmsg call. If the socket
    does not support the callout we will leave recv works running or get hangs
    later when the device or module is removed.

    This adds a check during socket connection/reconnection to make sure the
    socket being passed in supports the needed callout.

    Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
    Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
    Tested-by: Richard W.M. Jones
    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • We hit the following warning in production

    print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
    ------------[ cut here ]------------
    refcount_t: underflow; use-after-free.
    WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
    Workqueue: knbd-recv recv_work [nbd]
    RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
    Call Trace:
    blk_mq_free_request+0xb7/0xf0
    blk_mq_complete_request+0x62/0xf0
    recv_work+0x29/0xa1 [nbd]
    process_one_work+0x1f5/0x3f0
    worker_thread+0x2d/0x3d0
    ? rescuer_thread+0x340/0x340
    kthread+0x111/0x130
    ? kthread_create_on_node+0x60/0x60
    ret_from_fork+0x1f/0x30
    ---[ end trace b079c3c67f98bb7c ]---

    This was preceded by us timing out everything and shutting down the
    sockets for the device. The problem is we had a request in the queue at
    the same time, so we completed the request twice. This can actually
    happen in a lot of cases, we fail to get a ref on our config, we only
    have one connection and just error out the command, etc.

    Fix this by checking cmd->status in nbd_read_stat. We only change this
    under the cmd->lock, so we are safe to check this here and see if we've
    already error'ed this command out, which would indicate that we've
    completed it as well.

    Reviewed-by: Mike Christie
    Signed-off-by: Josef Bacik

    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • We already do this for the most part, except in timeout and clear_req.
    For the timeout case we take the lock after we grab a ref on the config,
    but that isn't really necessary because we're safe to touch the cmd at
    this point, so just move the order around.

    For the clear_req cause this is initiated by the user, so again is safe.

    Reviewed-by: Mike Christie
    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     

19 Oct, 2019

1 commit

  • CPU0: CPU1:
    backing_dev_show backing_dev_store
    ...... ......
    file = zram->backing_dev;
    down_read(&zram->init_lock); down_read(&zram->init_init_lock)
    file_path(file, ...); zram->backing_dev = backing_dev;
    up_read(&zram->init_lock); up_read(&zram->init_lock);

    gets the value of zram->backing_dev too early in backing_dev_show, which
    resultin the value being NULL at the beginning, and not NULL later.

    backtrace:
    d_path+0xcc/0x174
    file_path+0x10/0x18
    backing_dev_show+0x40/0xb4
    dev_attr_show+0x20/0x54
    sysfs_kf_seq_show+0x9c/0x10c
    kernfs_seq_show+0x28/0x30
    seq_read+0x184/0x488
    kernfs_fop_read+0x5c/0x1a4
    __vfs_read+0x44/0x128
    vfs_read+0xa0/0x138
    SyS_read+0x54/0xb4

    Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
    Signed-off-by: Chenwandun
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Jens Axboe
    Cc: [4.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chenwandun
     

15 Oct, 2019

1 commit

  • There is a warning message in my test with below steps:

    # rbd bench --io-type write --io-size 4K --io-threads 1 --io-pattern rand test &
    # sleep 5
    # pkill -9 rbd
    # rbd map test &
    # sleep 5
    # pkill rbd

    The reason is that the rbd_add_acquire_lock() is interruptable,
    that means, when we kill the waiting on ->acquire_wait, the lock_dwork
    could be still running.

    1. do_rbd_add() 2. lock_dwork
    rbd_add_acquire_lock()
    - queue_delayed_work()
    lock_dwork queued
    - wait_for_completion_killable_timeout() lock_dwork)

    Then when we reach the rbd_dev_free(), WARN_ON is triggered because
    lock_state is not RBD_LOCK_STATE_UNLOCKED.

    To fix it, this commit make sure the lock_dwork was finished before
    calling rbd_dev_image_unlock().

    On the other hand, this would not happend in do_rbd_remove(), because
    after rbd mapped, lock_dwork will only be queued for IO request, and
    request will continue unless lock_dwork finished. when we call
    rbd_dev_image_unlock() in do_rbd_remove(), all requests are done.
    That means, lock_state should not be locked again after
    rbd_dev_image_unlock().

    [ Cancel lock_dwork in rbd_add_acquire_lock(), only if the wait is
    interrupted. ]

    Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
    Signed-off-by: Dongsheng Yang
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Dongsheng Yang
     

11 Oct, 2019

1 commit

  • Pull block fixes from Jens Axboe:

    - Fix wbt performance regression introduced with the blk-rq-qos
    refactoring (Harshad)

    - Fix io_uring fileset removal inadvertently killing the workqueue (me)

    - Fix io_uring typo in linked command nonblock submission (Pavel)

    - Remove spurious io_uring wakeups on request free (Pavel)

    - Fix null_blk zoned command error return (Keith)

    - Don't use freezable workqueues for backing_dev, also means we can
    revert a previous libata hack (Mika)

    - Fix nbd sysfs mutex dropped too soon at removal time (Xiubo)

    * tag 'for-linus-20191010' of git://git.kernel.dk/linux-block:
    nbd: fix possible sysfs duplicate warning
    null_blk: Fix zoned command return code
    io_uring: only flush workqueues on fileset removal
    io_uring: remove wait loop spurious wakeups
    blk-wbt: fix performance regression in wbt scale_up/scale_down
    Revert "libata, freezer: avoid block device removal while system is frozen"
    bdi: Do not use freezable workqueue
    io_uring: fix reversed nonblock flag for link submission

    Linus Torvalds
     

10 Oct, 2019

2 commits

  • 1. nbd_put takes the mutex and drops nbd->ref to 0. It then does
    idr_remove and drops the mutex.

    2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails
    to find an existing device, so it does nbd_dev_add.

    3. just before the nbd_put could call nbd_dev_remove or not finished
    totally, but if nbd_dev_add try to add_disk, we can hit:

    debugfs: Directory 'nbd1' with parent 'block' already present!

    This patch will make sure all the disk add/remove stuff are done
    by holding the nbd_index_mutex lock.

    Reported-by: Mike Christie
    Reviewed-by: Josef Bacik
    Signed-off-by: Xiubo Li
    Signed-off-by: Jens Axboe

    Xiubo Li
     
  • The return code from null_handle_zoned() sets the cmd->error value.
    Returning OK status when an error occured overwrites the intended
    cmd->error. Return the appropriate error code instead of setting the
    error in the cmd.

    Fixes: fceb5d1b19cbe626 ("null_blk: create a helper for zoned devices")
    Cc: Chaitanya Kulkarni
    Signed-off-by: Keith Busch
    Signed-off-by: Jens Axboe

    Keith Busch
     

05 Oct, 2019

1 commit

  • Pull block fixes from Jens Axboe:

    - Mandate timespec64 for the io_uring timeout ABI (Arnd)

    - Set of NVMe changes via Sagi:
    - controller removal race fix from Balbir
    - quirk additions from Gabriel and Jian-Hong
    - nvme-pci power state save fix from Mario
    - Add 64bit user commands (for 64bit registers) from Marta
    - nvme-rdma/nvme-tcp fixes from Max, Mark and Me
    - Minor cleanups and nits from James, Dan and John

    - Two s390 dasd fixes (Jan, Stefan)

    - Have loop change block size in DIO mode (Martijn)

    - paride pg header ifdef guard (Masahiro)

    - Two blk-mq queue scheduler tweaks, fixing an ordering issue on zoned
    devices and suboptimal performance on others (Ming)

    * tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block: (22 commits)
    block: sed-opal: fix sparse warning: convert __be64 data
    block: sed-opal: fix sparse warning: obsolete array init.
    block: pg: add header include guard
    Revert "s390/dasd: Add discard support for ESE volumes"
    s390/dasd: Fix error handling during online processing
    io_uring: use __kernel_timespec in timeout ABI
    loop: change queue block size to match when using DIO
    blk-mq: apply normal plugging for HDD
    blk-mq: honor IO scheduler for multiqueue devices
    nvme-rdma: fix possible use-after-free in connect timeout
    nvme: Move ctrl sqsize to generic space
    nvme: Add ctrl attributes for queue_count and sqsize
    nvme: allow 64-bit results in passthru commands
    nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
    nvmet-tcp: remove superflous check on request sgl
    Added QUIRKs for ADATA XPG SX8200 Pro 512GB
    nvme-rdma: Fix max_hw_sectors calculation
    nvme: fix an error code in nvme_init_subsystem()
    nvme-pci: Save PCI state before putting drive into deepest state
    nvme-tcp: fix wrong stop condition in io_work
    ...

    Linus Torvalds
     

01 Oct, 2019

1 commit

  • The loop driver assumes that if the passed in fd is opened with
    O_DIRECT, the caller wants to use direct I/O on the loop device.
    However, if the underlying block device has a different block size than
    the loop block queue, direct I/O can't be enabled. Instead of requiring
    userspace to manually change the blocksize and re-enable direct I/O,
    just change the queue block sizes to match, as well as the io_min size.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Martijn Coenen
    Signed-off-by: Jens Axboe

    Martijn Coenen
     

27 Sep, 2019

1 commit

  • Merge more updates from Andrew Morton:

    - almost all of the rest of -mm

    - various other subsystems

    Subsystems affected by this patch series:
    memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork,
    cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise,
    cleanups, pagemap

    * emailed patches from Andrew Morton : (77 commits)
    arch/sparc/include/asm/pgtable_64.h: fix build
    mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
    ntfs: remove (un)?likely() from IS_ERR() conditions
    IB/hfi1: remove unlikely() from IS_ERR*() condition
    xfs: remove unlikely() from WARN_ON() condition
    wimax/i2400m: remove unlikely() from WARN*() condition
    fs: remove unlikely() from WARN_ON() condition
    xen/events: remove unlikely() from WARN() condition
    checkpatch: check for nested (un)?likely() calls
    hexagon: drop empty and unused free_initrd_mem
    mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
    mm: introduce MADV_PAGEOUT
    mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
    mm: introduce MADV_COLD
    mm: untag user pointers in mmap/munmap/mremap/brk
    vfio/type1: untag user pointers in vaddr_get_pfn
    tee/shm: untag user pointers in tee_shm_register
    media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
    drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
    drm/amdgpu: untag user pointers
    ...

    Linus Torvalds
     

26 Sep, 2019

2 commits

  • Add RB_DECLARE_CALLBACKS_MAX, which generates augmented rbtree callbacks
    for the case where the augmented value is a scalar whose definition
    follows a max(f(node)) pattern. This actually covers all present uses of
    RB_DECLARE_CALLBACKS, and saves some (source) code duplication in the
    various RBCOMPUTE function definitions.

    [walken@google.com: fix mm/vmalloc.c]
    Link: http://lkml.kernel.org/r/CANN689FXgK13wDYNh1zKxdipeTuALG4eKvKpsdZqKFJ-rvtGiQ@mail.gmail.com
    [walken@google.com: re-add check to check_augmented()]
    Link: http://lkml.kernel.org/r/20190727022027.GA86863@google.com
    Link: http://lkml.kernel.org/r/20190703040156.56953-3-walken@google.com
    Signed-off-by: Michel Lespinasse
    Acked-by: Peter Zijlstra (Intel)
    Cc: David Howells
    Cc: Davidlohr Bueso
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Pull ceph updates from Ilya Dryomov:
    "The highlights are:

    - automatic recovery of a blacklisted filesystem session (Zheng Yan).
    This is disabled by default and can be enabled by mounting with the
    new "recover_session=clean" option.

    - serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is
    taken to avoid serializing O_DIRECT reads and writes with each
    other, this is based on the exclusion scheme from NFS.

    - handle large osdmaps better in the face of fragmented memory
    (myself)

    - don't limit what security.* xattrs can be get or set (Jeff Layton).
    We were overly restrictive here, unnecessarily preventing things
    like file capability sets stored in security.capability from
    working.

    - allow copy_file_range() within the same inode and across different
    filesystems within the same cluster (Luis Henriques)"

    * tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits)
    ceph: call ceph_mdsc_destroy from destroy_fs_client
    libceph: use ceph_kvmalloc() for osdmap arrays
    libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc()
    ceph: allow object copies across different filesystems in the same cluster
    ceph: include ceph_debug.h in cache.c
    ceph: move static keyword to the front of declarations
    rbd: pull rbd_img_request_create() dout out into the callers
    ceph: reconnect connection if session hang in opening state
    libceph: drop unused con parameter of calc_target()
    ceph: use release_pages() directly
    rbd: fix response length parameter for encoded strings
    ceph: allow arbitrary security.* xattrs
    ceph: only set CEPH_I_SEC_INITED if we got a MAC label
    ceph: turn ceph_security_invalidate_secctx into static inline
    ceph: add buffered/direct exclusionary locking for reads and writes
    libceph: handle OSD op ceph_pagelist_append() errors
    ceph: don't return a value from void function
    ceph: don't freeze during write page faults
    ceph: update the mtime when truncating up
    ceph: fix indentation in __get_snap_name()
    ...

    Linus Torvalds
     

23 Sep, 2019

1 commit

  • Anatoly reports that he gets the below warning when booting -git on
    a sparc64 box on debian unstable:

    ...
    [ 13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES
    implementation
    [ 13.428002] ------------[ cut here ]------------
    [ 13.428081] WARNING: CPU: 21 PID: 586 at
    drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
    [ 13.428147] Attempt to register a non-SCSI queue
    [ 13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64
    n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash
    sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4
    ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
    async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear
    md_mod crc32c_sparc64
    [ 13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted
    5.3.0-10169-g574cc4539762 #1234
    [ 13.428507] Call Trace:
    [ 13.428542] [00000000004635c0] __warn+0xc0/0x100
    [ 13.428582] [0000000000463634] warn_slowpath_fmt+0x34/0x60
    [ 13.428626] [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
    [ 13.428674] [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd]
    [ 13.428724] [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0
    [ 13.428764] [00000000006b96c8] ksys_ioctl+0x48/0x80
    [ 13.428803] [00000000006b9714] sys_ioctl+0x14/0x40
    [ 13.428847] [0000000000406294] linux_sparc_syscall+0x34/0x44
    [ 13.428890] irq event stamp: 4181
    [ 13.428924] hardirqs last enabled at (4189): []
    console_unlock+0x634/0x6c0
    [ 13.428984] hardirqs last disabled at (4196): []
    console_unlock+0x100/0x6c0
    [ 13.429048] softirqs last enabled at (3978): []
    __do_softirq+0x498/0x520
    [ 13.429110] softirqs last disabled at (3967): []
    do_softirq_own_stack+0x34/0x60
    [ 13.429172] ---[ end trace 2220ca468f32967d ]---
    [ 13.430018] pktcdvd: setup of pktcdvd device failed
    [ 13.455589] des_sparc64: Using sparc64 des opcodes optimized DES
    implementation
    [ 13.515334] camellia_sparc64: Using sparc64 camellia opcodes
    optimized CAMELLIA implementation
    [ 13.522856] pktcdvd: setup of pktcdvd device failed
    [ 13.529327] pktcdvd: setup of pktcdvd device failed
    [ 13.532932] pktcdvd: setup of pktcdvd device failed
    [ 13.536165] pktcdvd: setup of pktcdvd device failed
    [ 13.539372] pktcdvd: setup of pktcdvd device failed
    [ 13.542834] pktcdvd: setup of pktcdvd device failed
    [ 13.546536] pktcdvd: setup of pktcdvd device failed
    [ 15.431071] XFS (dm-0): Mounting V5 Filesystem
    ...

    Apparently debian auto-attaches any cdrom like device to pktcdvd, which
    can lead to the above warning. There's really no reason to warn for this
    situation, kill it.

    Reported-by: Anatoly Pugachev
    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Sep, 2019

3 commits

  • When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same
    time when the socket is closed due to the server daemon is restarted,
    just before the last DISCONNET is totally done if we start a new connection
    by using the old nbd_index, there will be crashing randomly, like:

    [ 110.151949] block nbd1: Receive control failed (result -32)
    [ 110.152024] BUG: unable to handle page fault for address: 0000058000000840
    [ 110.152063] #PF: supervisor read access in kernel mode
    [ 110.152083] #PF: error_code(0x0000) - not-present page
    [ 110.152094] PGD 0 P4D 0
    [ 110.152106] Oops: 0000 [#1] SMP PTI
    [ 110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2
    [ 110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [ 110.152166] Workqueue: knbd-recv recv_work [nbd]
    [ 110.152187] RIP: 0010:__dev_printk+0xd/0x67
    [ 110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...]
    [ 110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206
    [ 110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000
    [ 110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f
    [ 110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56
    [ 110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900
    [ 110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8
    [ 110.152329] FS: 0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000
    [ 110.152362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0
    [ 110.152401] Call Trace:
    [ 110.152422] _dev_err+0x6c/0x83
    [ 110.152435] nbd_read_stat.cold+0xda/0x578 [nbd]
    [ 110.152448] ? __switch_to_asm+0x34/0x70
    [ 110.152468] ? __switch_to_asm+0x40/0x70
    [ 110.152478] ? __switch_to_asm+0x34/0x70
    [ 110.152491] ? __switch_to_asm+0x40/0x70
    [ 110.152501] ? __switch_to_asm+0x34/0x70
    [ 110.152511] ? __switch_to_asm+0x40/0x70
    [ 110.152522] ? __switch_to_asm+0x34/0x70
    [ 110.152533] recv_work+0x35/0x9e [nbd]
    [ 110.152547] process_one_work+0x19d/0x340
    [ 110.152558] worker_thread+0x50/0x3b0
    [ 110.152568] kthread+0xfb/0x130
    [ 110.152577] ? process_one_work+0x340/0x340
    [ 110.152609] ? kthread_park+0x80/0x80
    [ 110.152637] ret_from_fork+0x35/0x40

    This is very easy to reproduce by running the nbd-runner.

    Reviewed-by: Josef Bacik
    Signed-off-by: Xiubo Li
    Signed-off-by: Jens Axboe

    Xiubo Li
     
  • Preparing for the destory when disconnecting crash fixing.

    Reviewed-by: Josef Bacik
    Signed-off-by: Xiubo Li
    Signed-off-by: Jens Axboe

    Xiubo Li
     
  • Pull block updates from Jens Axboe:

    - Two NVMe pull requests:
    - ana log parse fix from Anton
    - nvme quirks support for Apple devices from Ben
    - fix missing bio completion tracing for multipath stack devices
    from Hannes and Mikhail
    - IP TOS settings for nvme rdma and tcp transports from Israel
    - rq_dma_dir cleanups from Israel
    - tracing for Get LBA Status command from Minwoo
    - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
    - Some consolidation between the fabrics transports for handling
    the CAP register
    - reset race with ns scanning fix for fabrics (move fabrics
    commands to a dedicated request queue with a different lifetime
    from the admin request queue)."
    - controller reset and namespace scan races fixes
    - nvme discovery log change uevent support
    - naming improvements from Keith
    - multiple discovery controllers reject fix from James
    - some regular cleanups from various people

    - Series fixing (and re-fixing) null_blk debug printing and nr_devices
    checks (André)

    - A few pull requests from Song, with fixes from Andy, Guoqing,
    Guilherme, Neil, Nigel, and Yufen.

    - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

    - Bio merge handling unification (Christoph)

    - Pick default elevator correctly for devices with special needs
    (Damien)

    - Block stats fixes (Hou)

    - Timeout and support devices nbd fixes (Mike)

    - Series fixing races around elevator switching and device add/remove
    (Ming)

    - sed-opal cleanups (Revanth)

    - Per device weight support for BFQ (Fam)

    - Support for blk-iocost, a new model that can properly account cost of
    IO workloads. (Tejun)

    - blk-cgroup writeback fixes (Tejun)

    - paride queue init fixes (zhengbin)

    - blk_set_runtime_active() cleanup (Stanley)

    - Block segment mapping optimizations (Bart)

    - lightnvm fixes (Hans/Minwoo/YueHaibing)

    - Various little fixes and cleanups

    * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
    null_blk: format pr_* logs with pr_fmt
    null_blk: match the type of parameter nr_devices
    null_blk: do not fail the module load with zero devices
    block: also check RQF_STATS in blk_mq_need_time_stamp()
    block: make rq sector size accessible for block stats
    bfq: Fix bfq linkage error
    raid5: use bio_end_sector in r5_next_bio
    raid5: remove STRIPE_OPS_REQ_PENDING
    md: add feature flag MD_FEATURE_RAID0_LAYOUT
    md/raid0: avoid RAID0 data corruption due to layout confusion.
    raid5: don't set STRIPE_HANDLE to stripe which is in batch list
    raid5: don't increment read_errors on EILSEQ return
    nvmet: fix a wrong error status returned in error log page
    nvme: send discovery log page change events to userspace
    nvme: add uevent variables for controller devices
    nvme: enable aen regardless of the presence of I/O queues
    nvme-fabrics: allow discovery subsystems accept a kato
    nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
    nvme: Remove redundant assignment of cq vector
    nvme: Assign subsys instance from first ctrl
    ...

    Linus Torvalds
     

16 Sep, 2019

5 commits

  • Instead of writing "null_blk: " at the beginning of each
    pr_err/info/warn log message, format messages using pr_fmt() macro.

    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: André Almeida
    Signed-off-by: Jens Axboe

    André Almeida
     
  • Since the variable nr_devices is an unsigned int, the module_param()
    should also use this type. Change the type so they can match.

    Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices")
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: André Almeida
    Signed-off-by: Jens Axboe

    André Almeida
     
  • The module load should fail only if there is something wrong with the
    configuration or if an error prevents it to work properly. The module
    should be able to be loaded with (nr_device == 0), since it will not
    trigger errors or be in malfunction state. Preventing loading with zero
    devices also breaks applications that configures this module using
    configfs API. Remove the nr_device check to fix this.

    Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices")
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: André Almeida
    Signed-off-by: Jens Axboe

    André Almeida
     
  • Make it more informative: log op_type, offset and length for block
    layer requests and initiating obj_req for child requests.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • rbd_dev_image_id() allocates space for length but passes a smaller
    value to rbd_obj_method_sync(). rbd_dev_v2_object_prefix() doesn't
    allocate space for length. Fix both to be consistent.

    Signed-off-by: Dongsheng Yang
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Dongsheng Yang
     

12 Sep, 2019

1 commit

  • A negative number of devices is nonsensical, so change the type to
    unsigned. If the number of devices is 0, it is impossible for userspace
    to interact with the module, so refuse loading the driver for that case.

    Signed-off-by: André Almeida
    Signed-off-by: Jens Axboe

    André Almeida