11 Apr, 2020

1 commit

  • Pull block fixes from Jens Axboe:
    "Here's a set of fixes that should go into this merge window. This
    contains:

    - NVMe pull request from Christoph with various fixes

    - Better discard support for loop (Evan)

    - Only call ->commit_rqs() if we have queued IO (Keith)

    - blkcg offlining fixes (Tejun)

    - fix (and fix the fix) for busy partitions"

    * tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block:
    block: fix busy device checking in blk_drop_partitions again
    block: fix busy device checking in blk_drop_partitions
    nvmet-rdma: fix double free of rdma queue
    blk-mq: don't commit_rqs() if none were queued
    nvme-fc: Revert "add module to ops template to allow module references"
    nvme: fix deadlock caused by ANA update wrong locking
    nvmet-rdma: fix bonding failover possible NULL deref
    loop: Better discard support for block devices
    loop: Report EOPNOTSUPP properly
    nvmet: fix NULL dereference when removing a referral
    nvme: inherit stable pages constraint in the mpath stack device
    blkcg: don't offline parent blkcg first
    blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it
    nvme-tcp: fix possible crash in recv error flow
    nvme-tcp: don't poll a non-live queue
    nvme-tcp: fix possible crash in write_zeroes processing
    nvmet-fc: fix typo in comment
    nvme-rdma: Replace comma with a semicolon
    nvme-fcloop: fix deallocation of working context
    nvme: fix compat address handling in several ioctls

    Linus Torvalds
     

09 Apr, 2020

1 commit

  • Pull ceph updates from Ilya Dryomov:
    "The main items are:

    - support for asynchronous create and unlink (Jeff Layton).

    Creates and unlinks are satisfied locally, without waiting for a
    reply from the MDS, provided the client has been granted
    appropriate caps (new in v15.y.z ("Octopus") release). This can be
    a big help for metadata heavy workloads such as tar and rsync.
    Opt-in with the new nowsync mount option.

    - multiple blk-mq queues for rbd (Hannes Reinecke and myself).

    When the driver was converted to blk-mq, we settled on a single
    blk-mq queue because of a global lock in libceph and some other
    technical debt. These have since been addressed, so allocate a
    queue per CPU to enhance parallelism.

    - don't hold onto caps that aren't actually needed (Zheng Yan).

    This has been our long-standing behavior, but it causes issues with
    some active/standby applications (synchronous I/O, stalls if the
    standby goes down, etc).

    - .snap directory timestamps consistent with ceph-fuse (Luis
    Henriques)"

    * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits)
    ceph: fix snapshot directory timestamps
    ceph: wait for async creating inode before requesting new max size
    ceph: don't skip updating wanted caps when cap is stale
    ceph: request new max size only when there is auth cap
    ceph: cleanup return error of try_get_cap_refs()
    ceph: return ceph_mdsc_do_request() errors from __get_parent()
    ceph: check all mds' caps after page writeback
    ceph: update i_requested_max_size only when sending cap msg to auth mds
    ceph: simplify calling of ceph_get_fmode()
    ceph: remove delay check logic from ceph_check_caps()
    ceph: consider inode's last read/write when calculating wanted caps
    ceph: always renew caps if mds_wanted is insufficient
    ceph: update dentry lease for async create
    ceph: attempt to do async create when possible
    ceph: cache layout in parent dir on first sync create
    ceph: add new MDS req field to hold delegated inode number
    ceph: decode interval_sets for delegated inos
    ceph: make ceph_fill_inode non-static
    ceph: perform asynchronous unlink if we have sufficient caps
    ceph: don't take refs to want mask unless we have all bits
    ...

    Linus Torvalds
     

04 Apr, 2020

2 commits

  • If the backing device for a loop device is itself a block device,
    then mirror the "write zeroes" capabilities of the underlying
    block device into the loop device. Copy this capability into both
    max_write_zeroes_sectors and max_discard_sectors of the loop device.

    The reason for this is that REQ_OP_DISCARD on a loop device translates
    into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
    presents a consistent interface for loop devices (that discarded data
    is zeroed), regardless of the backing device type of the loop device.
    There should be no behavior change for loop devices backed by regular
    files.

    This change fixes blktest block/003, and removes an extraneous
    error print in block/013 when testing on a loop device backed
    by a block device that does not support discard.

    Signed-off-by: Evan Green
    Reviewed-by: Gwendal Grignou
    Reviewed-by: Chaitanya Kulkarni
    [used updated version of Evan's comment in loop_config_discard()]
    [moved backingq to local scope, removed redundant braces]
    Signed-off-by: Andrzej Pietrasiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Evan Green
     
  • Properly plumb out EOPNOTSUPP from loop driver operations, which may
    get returned when for instance a discard operation is attempted but not
    supported by the underlying block device. Before this change, everything
    was reported in the log as an I/O error, which is scary and not
    helpful in debugging.

    Signed-off-by: Evan Green
    Reviewed-by: Gwendal Grignou
    Reviewed-by: Bart Van Assche
    Signed-off-by: Andrzej Pietrasiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Evan Green
     

31 Mar, 2020

2 commits

  • Pull block driver updates from Jens Axboe:

    - floppy driver cleanup series from Willy

    - NVMe updates and fixes (Various)

    - null_blk trace improvements (Chaitanya)

    - bcache fixes (Coly)

    - md fixes (via Song)

    - loop block size change optimizations (Martijn)

    - scnprintf() use (Takashi)

    * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits)
    null_blk: add trace in null_blk_zoned.c
    null_blk: add tracepoint helpers for zoned mode
    block: add a zone condition debug helper
    nvme: cleanup namespace identifier reporting in nvme_init_ns_head
    nvme: rename __nvme_find_ns_head to nvme_find_ns_head
    nvme: refactor nvme_identify_ns_descs error handling
    nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
    nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
    nvme: Fix controller creation races with teardown flow
    nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
    nvme: Fix ctrl use-after-free during sysfs deletion
    nvme-pci: Re-order nvme_pci_free_ctrl
    nvme: Remove unused return code from nvme_delete_ctrl_sync
    nvme: Use nvme_state_terminal helper
    nvme: release ida resources
    nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
    nvmet-tcp: optimize tcp stack TX when data digest is used
    nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
    nvme-multipath: do not reset on unknown status
    nvmet-rdma: allocate RW ctxs according to mdts
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:

    - Online capacity resizing (Balbir)

    - Number of hardware queue change fixes (Bart)

    - null_blk fault injection addition (Bart)

    - Cleanup of queue allocation, unifying the node/no-node API
    (Christoph)

    - Cleanup of genhd, moving code to where it makes sense (Christoph)

    - Cleanup of the partition handling code (Christoph)

    - disk stat fixes/improvements (Konstantin)

    - BFQ improvements (Paolo)

    - Various fixes and improvements

    * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
    block: return NULL in blk_alloc_queue() on error
    block: move bio_map_* to blk-map.c
    Revert "blkdev: check for valid request queue before issuing flush"
    block: simplify queue allocation
    bcache: pass the make_request methods to blk_queue_make_request
    null_blk: use blk_mq_init_queue_data
    block: add a blk_mq_init_queue_data helper
    block: move the ->devnode callback to struct block_device_operations
    block: move the part_stat* helpers from genhd.h to a new header
    block: move block layer internals out of include/linux/genhd.h
    block: move guard_bio_eod to bio.c
    block: unexport get_gendisk
    block: unexport disk_map_sector_rcu
    block: unexport disk_get_part
    block: mark part_in_flight and part_in_flight_rw static
    block: mark block_depr static
    block: factor out requeue handling from dispatch code
    block/diskstats: replace time_in_queue with sum of request times
    block/diskstats: accumulate all per-cpu counters in one pass
    block/diskstats: more accurate approximation of io_ticks for slow disks
    ...

    Linus Torvalds
     

30 Mar, 2020

6 commits


28 Mar, 2020

4 commits

  • With the help of previously added tracepoints we can now trace
    report-zones, zone-write and zone-mgmt ops in null_blk_zoned.c.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • This patch adds two new tracpoints for null_blk_zoned.c that allows us
    to trace report-zones, zone-mgmt-op and zone-write operations which has
    direct effect on the zone condition state machine.

    Also, we update drivers/block/Makefile so that new null_blk related
    tracefiles can be compiled.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Current make_request based drivers use either blk_alloc_queue_node or
    blk_alloc_queue to allocate a queue, and then set up the make_request_fn
    function pointer and a few parameters using the blk_queue_make_request
    helper. Simplify this by passing the make_request pointer to
    blk_alloc_queue, and while at it merge the _node variant into the main
    helper by always passing a node_id, and remove the superfluous gfp_mask
    parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Use the new blk_mq_init_queue_data instead of open coding the queue
    allocation and initialization.

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 Mar, 2020

1 commit


25 Mar, 2020

1 commit


20 Mar, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertenly introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Jens Axboe

    Gustavo A. R. Silva
     

19 Mar, 2020

2 commits


16 Mar, 2020

13 commits

  • This is done in order to remove the confusion that arises at some places
    in the code where local variables or arguments shadow the global variable.
    It is already visible that some places are a bit awkward and iterate over
    the global variable, for the sole reason that they used to rely on it being
    named "fdc" in order to get the correct address when using FD_DOR. These
    ones are easy to spot by searching for "for (current_fdc...".

    Some more cleanup is definitely possible. For example
    "fdc_state[current_fdc].somefield" is used all over the code and would
    probably be better with "fdc_state->somefield" with fdc_state being set
    when current_fdc is assigned. This would require to pass the pointer to
    the current state instead of the current_fdc to the I/O functions.

    Link: https://lore.kernel.org/r/20200301195555.11154-7-w@1wt.eu
    Cc: Linus Torvalds
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be
    defined relative to FD_IOPORT, which is the FDC's base address, itself
    a macro depending on the "fdc" local or global variable.

    This patch changes this so that the register macros above now only
    reference the address offset, and that the FDC's address is explicitly
    passed in each call to fd_inb() and fd_outb(), thus removing the macro.
    With this change there is no more implicit usage of the local/global
    "fdc" variable.

    One place in the ARM code used to check if the port was equal to FD_DOR,
    this was changed to testing the register by applying a mask to the port,
    as was already done in the sparc code.

    There are still occurrences of fd_inb() and fd_outb() in the PARISC
    code and these ones remain unaffected since they already used to work
    with a base address and a register offset.

    The sparc, m68k and parisc code could now be slightly cleaned up to
    benefit from the macro definitions above instead of the equivalent
    hard-coded values.

    Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu
    Cc: Ian Molton
    Cc: Russell King
    Cc: Linus Torvalds
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • These two functions replace fd_inb() and fd_outb() in that they take
    the FDC in argument. This will ease the separation of the base address
    and the port everywhere the code is used.

    Link: https://lore.kernel.org/r/20200301195555.11154-5-w@1wt.eu
    Cc: Linus Torvalds
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • Several macros were used to access reply_buffer[] at discrete positions
    without making it obvious they were relying on this. These ones have
    been replaced by their offset in the reply buffer to make these accesses
    more obvious.

    Link: https://lore.kernel.org/r/20200224212352.8640-11-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • Various macros were used to access raw_cmd for R/W or format commands
    without making it obvious that raw_cmd->cmd[] was used. Let's expand
    the macros to make this more obvious.

    Link: https://lore.kernel.org/r/20200224212352.8640-10-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using global variable "current_drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-9-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using global variable "current_drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-8-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using global variable "current_drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-7-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using local variable "drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-6-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using local variable "drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-5-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using local variable "drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-4-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • This macro doesn't bring much value and only slightly obfuscates the
    code by silently using local variable "drive", let's expand it.

    Link: https://lore.kernel.org/r/20200224212352.8640-3-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     
  • Macro FDCS silently uses identifier "fdc" which may be either the
    global one or a local one. Let's expand the macro to make this more
    obvious.

    Link: https://lore.kernel.org/r/20200224212352.8640-2-w@1wt.eu
    Signed-off-by: Willy Tarreau
    Signed-off-by: Denis Efremov
    Signed-off-by: Jens Axboe

    Willy Tarreau
     

13 Mar, 2020

1 commit

  • As null_blk is a very good start point to test block layer, this patch
    adds description and comments to 'timeout', 'requeue' and 'init_hctx' to
    explain how to use fault injection with null_blk.

    The nvme has similar with nvme_core.fail_request in the form of comment.

    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Dongli Zhang
    Signed-off-by: Jens Axboe

    Dongli Zhang
     

12 Mar, 2020

5 commits

  • Steps to reproduce:

    BLKRESETZONE zone 0

    // force EIO
    pwrite(fd, buf, 4096, 4096);

    [issue more IO including zone ioctls]

    It will start failing randomly including IO to unrelated zones because of
    ->error "reuse". Trigger can be partition detection as well if test is not
    run immediately which is even more entertaining.

    The fix is of course to clear ->error where necessary.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alexey Dobriyan (SK hynix)
    Signed-off-by: Jens Axboe

    Alexey Dobriyan
     
  • In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
    it is allowed to reset timer when it fires if tag_set.timeout
    is set to zero. If the server is shutdown and a new socket
    is reconfigured, the request should be requeued to be processed by
    new server instead of waiting for response from the old one.

    Reviewed-by: Josef Bacik
    Signed-off-by: Hou Pu
    Signed-off-by: Jens Axboe

    Hou Pu
     
  • Nbd server with multiple connections could be upgraded since
    560bc4b (nbd: handle dead connections). But if only one conncection
    is configured, after we take down nbd server, all inflight IO
    would finally timeout and return error. We could requeue them
    like what we do with multiple connections and wait for new socket
    in submit path.

    Reviewed-by: Josef Bacik
    Signed-off-by: Hou Pu
    Signed-off-by: Jens Axboe

    Hou Pu
     
  • We deleted last_md_mark_dirty long ago, this function no longer needs to
    exist, delete it, otherwise a compilation error will occur when DEBUG is
    opened.

    Fixes: ac0acb9e39ac ("drbd: use drbd_device_post_work() in more place")
    Signed-off-by: Jackie Liu
    Signed-off-by: Jens Axboe

    Jackie Liu
     
  • Since snprintf() returns the would-be-output size instead of the
    actual output size, the succeeding calls may go beyond the given
    buffer limit. Fix it by replacing with scnprintf().

    Signed-off-by: Takashi Iwai
    Signed-off-by: Jens Axboe

    Takashi Iwai