09 Dec, 2016

3 commits


06 Dec, 2016

22 commits


05 Dec, 2016

1 commit

  • Since commit e73c23ff736e ("block: add async variant of
    blkdev_issue_zeroout") messages like the following show up:

    EXT4-fs (dm-1): Delayed block allocation failed for inode 2368848 at
    logical offset 0 with max blocks 1 with error 95
    EXT4-fs (dm-1): This should not happen!! Data will be lost

    Due to the following fallthrough introduced with
    commit 2d253440b5af ("block: Define zoned block device operations"),
    generic_make_request_checks() would accept a REQ_OP_WRITE_SAME bio only
    if the block device supports "write same" *and* is a zoned one:

    switch (bio_op(bio)) {
    [...]
    case REQ_OP_WRITE_SAME:
    if (!bdev_write_same(bio->bi_bdev))
    goto not_supported;
    case REQ_OP_ZONE_REPORT:
    case REQ_OP_ZONE_RESET:
    if (!bdev_is_zoned(bio->bi_bdev))
    goto not_supported;
    break;
    [...]
    }

    Thus, although the bio setup as done by __blkdev_issue_write_same() from
    commit e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
    would succeed, its actual submission would not, resulting in the
    EOPNOTSUPP == 95.

    Fix this by removing the fallthrough which, due to the lack of an explicit
    comment, seems to be unintended anyway.

    Fixes: e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
    Fixes: 2d253440b5af ("block: Define zoned block device operations")
    Signed-off-by: Nicolai Stange
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Nicolai Stange
     

04 Dec, 2016

1 commit

  • We have this:

    ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined!
    ERROR: "__divdi3" [drivers/block/nbd.ko] undefined!
    nbd.c:(.text+0x247c72): undefined reference to `__divdi3'

    due to a recent commit, that did 64-bit division. Use the proper
    divider function so that 32-bit compiles don't break.

    Fixes: ef77b515243b ("nbd: use loff_t for blocksize and nbd_set_size args")
    Signed-off-by: Jens Axboe

    Jens Axboe
     

03 Dec, 2016

2 commits

  • If we have large devices (say like the 40t drive I was trying to test with) we
    will end up overflowing the int arguments to nbd_set_size and not get the right
    size for our device. Fix this by using loff_t everywhere so I don't have to
    think about this again. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • Signed-off-by: Shaohua Li
    Fixes: cf43e6be865a ("block: add scalable completion tracking of requests")
    Signed-off-by: Jens Axboe

    Shaohua Li
     

01 Dec, 2016

9 commits

  • Factor out common code for setting REQ_NOMERGE flag which is being used
    out at certain places and make it a helper instead, req_set_nomerge().

    Signed-off-by: Ritesh Harjani

    Get rid of the inline.

    Signed-off-by: Jens Axboe

    Ritesh Harjani
     
  • If a block device is closed while iterate_bdevs() is handling it, the
    following NULL pointer dereference occurs because bdev->b_disk is NULL
    in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
    turn called by the mapping_cap_writeback_dirty() call in
    __filemap_fdatawrite_range()):

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
    IP: [] blk_get_backing_dev_info+0x10/0x20
    PGD 9e62067 PUD 9ee8067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in:
    CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
    RIP: 0010:[] [] blk_get_backing_dev_info+0x10/0x20
    RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
    RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
    RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
    R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
    FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
    Stack:
    ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
    0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
    ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
    Call Trace:
    [] __filemap_fdatawrite_range+0x85/0x90
    [] filemap_fdatawrite+0x1f/0x30
    [] fdatawrite_one_bdev+0x16/0x20
    [] iterate_bdevs+0xf2/0x130
    [] sys_sync+0x63/0x90
    [] entry_SYSCALL_64_fastpath+0x12/0x76
    Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 8b 80 08 05 00 00 5d
    RIP [] blk_get_backing_dev_info+0x10/0x20
    RSP
    CR2: 0000000000000508
    ---[ end trace 2487336ceb3de62d ]---

    The crash is easily reproducible by running the following command, if an
    msleep(100) is inserted before the call to func() in iterate_devs():

    while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

    Fix it by holding the bd_mutex across the func() call and only calling
    func() if the bdev is opened.

    Cc: stable@vger.kernel.org
    Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
    Reported-and-tested-by: Wei Fang
    Signed-off-by: Rabin Vincent
    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Rabin Vincent
     
  • Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function
    mtip_block_initialize(), variable rv takes the return value, and its
    value should be negative on errors. rv is initialized as 0 and is not
    reset when the call to ida_pre_get() fails. So 0 may be returned.
    The return value 0 indicates that there is no error, which may be
    inconsistent with the execution status. This patch fixes the bug by
    explicitly assigning -ENOMEM to rv on the branch that ida_pre_get()
    fails.

    Signed-off-by: Pan Bian
    Signed-off-by: Jens Axboe

    Pan Bian
     
  • Add support for handling write zeroes command on target.
    Call into __blkdev_issue_zeroout, which the block layer expands into the
    best suitable variant of zeroing the LBAs. Allow write zeroes operation
    to deallocate the LBAs when calling __blkdev_issue_zeroout.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block
    device, if the device supports optional command bit set for write
    zeroes. Add support to setup write zeroes command. Set maximum possible
    write zeroes sectors in one write zeroes command according to
    nvme write zeroes command definition.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Add the command structure, optional command set support (ONCS) bit and
    a new error code for the Write Zeroes command.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • This adds a new block layer operation to zero out a range of
    LBAs. This allows to implement zeroing for devices that don't use
    either discard with a predictable zero pattern or WRITE SAME of zeroes.
    The prominent example of that is NVMe with the Write Zeroes command,
    but in the future, this should also help with improving the way
    zeroing discards work. For this operation, suitable entry is exported in
    sysfs which indicate the number of maximum bytes allowed in one
    write zeroes operation by the device.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Similar to __blkdev_issue_discard this variant allows submitting
    the final bio asynchronously and chaining multiple ranges
    into a single completion.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of
    a zoned block device. However, the first and last zones reported for a
    partition make sense only if the partition start sector and size are aligned
    on the device zone size. The same applies for zone reset. Resetting the first
    or the last zone of a partition straddling zones may impact neighboring
    partitions. Finally, if a partition start sector is not at the beginning of a
    sequential zone, it will be impossible to write to the first sectors of the
    partition on a host-managed device.
    Avoid all these problems and incoherencies by ignoring partitions that are not
    zone aligned.

    Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the
    correct disk zoning type (host-aware, host-managed or none) but
    bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone
    size is unknown). So test this as a way to ensure that a zoned block device is
    being handled as such. As a result, for a host-aware devices, unaligned zone
    partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the
    disk will be treated as a regular block device (as it should). If zoned block
    device support is enabled, only aligned partitions will be accepted.

    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

30 Nov, 2016

2 commits