31 May, 2018

2 commits

  • We already check for started commands in all callbacks, but we should
    also protect against already completed commands. Do this by taking
    the checks to common code.

    Acked-by: Josef Bacik
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • When a userspace client requests a NBD device be disconnected, the
    DISCONNECT_REQUESTED flag is set. While this flag is set, the driver
    will not inform userspace when a connection is closed.

    Unfortunately the flag was never cleared, so once a disconnect was
    requested the driver would thereafter never tell userspace about a
    closed connection. Thus when connections failed due to timeout, no
    attempt to reconnect was made and eventually the device would fail.

    Fix by clearing the DISCONNECT_REQUESTED flag (and setting the
    DISCONNECTED flag) once all connections are closed.

    Reviewed-by: Josef Bacik
    Signed-off-by: Kevin Vigor
    Signed-off-by: Jens Axboe

    Kevin Vigor
     

30 May, 2018

2 commits

  • Bsg holding a reference to the parent device may result in a crash if a
    bsg file handle is closed after the parent device driver has unloaded.

    Holding a reference is not really needed: the parent device must exist
    between bsg_register_queue and bsg_unregister_queue. Before the device
    goes away the caller does blk_cleanup_queue so that all in-flight
    requests to the device are gone and all new requests cannot pass beyond
    the queue. The queue itself is a refcounted object and it will stay
    alive with a bsg file.

    Based on analysis, previous patch and changelog from Anatoliy Glagolev.

    Reported-by: Anatoliy Glagolev
    Reviewed-by: James E.J. Bottomley
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Pull NVMe changes from Christoph:

    "Here is the current batch of nvme updates for 4.18, we have a few more
    patches in the queue, but I'd like to get this pile into your tree
    and linux-next ASAP.

    The biggest item is support for file-backed namespaces in the NVMe
    target from Chaitanya, in addition to that we mostly small fixes from
    all the usual suspects."

    * 'nvme-4.18-2' of git://git.infradead.org/nvme:
    nvme: fixup memory leak in nvme_init_identify()
    nvme: fix KASAN warning when parsing host nqn
    nvmet-loop: use nr_phys_segments when map rq to sgl
    nvmet-fc: increase LS buffer count per fc port
    nvmet: add simple file backed ns support
    nvmet: remove duplicate NULL initialization for req->ns
    nvmet: make a few error messages more generic
    nvme-fabrics: allow duplicate connections to the discovery controller
    nvme-fabrics: centralize discovery controller defaults
    nvme-fabrics: remove unnecessary controller subnqn validation
    nvme-fc: remove setting DNR on exception conditions
    nvme-rdma: stop admin queue before freeing it
    nvme-pci: Fix AER reset handling
    nvme-pci: set nvmeq->cq_vector after alloc cq/sq
    nvme: host: core: fix precedence of ternary operator
    nvme: fix lockdep warning in nvme_mpath_clear_current_path

    Jens Axboe
     

29 May, 2018

13 commits

  • libiscsi is the only SCSI code that return BLK_EH_HANDLED, thus trying to
    bypass the normal SCSI EH code. We are going to remove this return value
    at the block layer, and at least from a quick look it doesn't look too
    harmful to try to send an abort for these cases, especially as the first
    one should not actually be possible. If this doesn't work out iscsi
    will probably need its own eh_strategy_handler instead to just do the
    right thing.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • By completing the request entirely in the driver we can remove the
    BLK_EH_HANDLED return value and thus the split responsibility between the
    driver and the block layer that has been causing trouble.

    [While this keeps existing behavior it seems to mismatch the comment,
    maintainers please chime in!]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • By completing the request entirely in the driver we can remove the
    BLK_EH_HANDLED return value and thus the split responsibility between the
    driver and the block layer that has been causing trouble.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • By completing the request entirely in the driver we can remove the
    BLK_EH_HANDLED return value and thus the split responsibility between the
    driver and the block layer that has been causing trouble.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • By completing the request entirely in the driver we can remove the
    BLK_EH_HANDLED return value and thus the split responsibility between the
    driver and the block layer that has been causing trouble.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • By completing the request entirely in the driver we can remove the
    BLK_EH_HANDLED return value and thus the split responsibility between the
    driver and the block layer that has been causing trouble.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • NVMe always completes the request before returning from ->timeout, either
    by polling for it, or by disabling the controller. Return BLK_EH_DONE so
    that the block layer doesn't even try to complete it again.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
    is not what is happening - instead the driver already completed the
    command. Fix the symbolic name to reflect that a little better.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • As far as I can tell this function can't even be called any more, given
    that ATA implements its own eh_strategy_handler with ata_scsi_error, which
    never calls ->eh_timed_out.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Kernel library has a common function to match user input from sysfs
    against an array of strings. Thus, replace bch_read_string_list() by
    __sysfs_match_string().

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Andy Shevchenko
     
  • There is couple of functions that are used exclusively in sysfs.c.
    Move it to there and make them static.

    Besides above, it will allow further clean up.

    No functional change intended.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Andy Shevchenko
     
  • There is couple of string arrays that are used exclusively in sysfs.c.
    Move it to there and make them static.

    Besides above, it will allow further clean up.

    No functional change intended.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Andy Shevchenko
     
  • Currently bcache does not handle backing device failure, if backing
    device is offline and disconnected from system, its bcache device can still
    be accessible. If the bcache device is in writeback mode, I/O requests even
    can success if the requests hit on cache device. That is to say, when and
    how bcache handles offline backing device is undefined.

    This patch tries to handle backing device offline in a rather simple way,
    - Add cached_dev->status_update_thread kernel thread to update backing
    device status in every 1 second.
    - Add cached_dev->offline_seconds to record how many seconds the backing
    device is observed to be offline. If the backing device is offline for
    BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and
    call bcache_device_stop() to stop the bache device which linked to the
    offline backing device.

    Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds,
    its bcache device will be removed, then user space application writing on
    it will get error immediately, and handler the device failure in time.

    This patch is quite simple, does not handle more complicated situations.
    Once the bcache device is stopped, users need to recovery the backing
    device, register and attach it manually.

    Changelog:
    v3: call wait_for_kthread_stop() before exits kernel thread.
    v2: remove "bcache: " prefix when calling pr_warn().
    v1: initial version.

    Signed-off-by: Coly Li
    Reviewed-by: Hannes Reinecke
    Cc: Michael Lyle
    Cc: Junhui Tang
    Signed-off-by: Jens Axboe

    Coly Li
     

25 May, 2018

15 commits

  • If nvme_get_effects_log() failed the 'id' buffer from the previous
    nvme_identify_ctrl() call will never be freed.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Christoph Hellwig

    Hannes Reinecke
     
  • The host nqn actually is smaller than the space reserved for it,
    so we should be using strlcpy to keep KASAN happy.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig

    Hannes Reinecke
     
  • Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
    if a command contains data to me mapped. This fixes the case where
    a struct requests contains LBAs, but no data will actually be send,
    e.g. the pending Write Zeroes support.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     
  • Todays limit on concurrent LS's is very small - 4 buffers. With large
    subsystem counts or large numbers of initiators connecting, the limit
    may be exceeded.

    Raise the LS buffer count to 256.

    Signed-off-by: James Smart
    Signed-off-by: Christoph Hellwig

    James Smart
     
  • This patch adds simple file backed namespace support for NVMeOF target.

    The new file io-cmd-file.c is responsible for handling the code for I/O
    commands when ns is file backed. Also, we introduce mempools based slow
    path using sync I/Os for file backed ns to ensure forward progress under
    reclaim.

    The old block device based implementation is moved to io-cmd-bdev.c and
    use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also
    move into the respective files.

    Signed-off-by: Chaitanya Kulkarni
    [hch: updated changelog, fixed double req->ns lookup in bdev case]
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     
  • Remove the duplicate NULL initialization for req->ns. req->ns is always
    initialized to NULL in nvmet_req_init(), so there is no need to reset
    it later on failures unless we have previously assigned a value to it.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     
  • "nvmet_check_ctrl_status()" is called from admin-cmd.c along
    with io-cmd.c, make the error message more generic.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     
  • The whole point of the discovery controller is that it can accept
    multiple connections. Additionally the cmic field is not even defined for
    the discovery controller identify page.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: James Smart
    Signed-off-by: Christoph Hellwig

    Hannes Reinecke
     
  • When connecting to the discovery controller we have certain defaults
    to observe, so centralize them to avoid inconsistencies due to argument
    ordering.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: James Smart
    Signed-off-by: Christoph Hellwig

    Hannes Reinecke
     
  • After creating the nvme controller, nvmf_create_ctrl() validates
    the newly created subsysnqn vs the one specified by the options.

    In general, this is an unnecessary check as the Connect message
    should implicitly ensure this value matches.

    With the change to the FC transport to do an asynchronous connect
    for the first association create, the transport will return to
    nvmf_create_ctrl() before that first association has been established,
    thus the subnqn will not yet be set.

    Remove the unnecessary validation.

    Signed-off-by: James Smart
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig

    James Smart
     
  • Current code will set DNR if the controller is deleting or there is
    an error during controller init. None of this is necessary.

    Remove the code that sets DNR

    Signed-off-by: James Smart
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig

    James Smart
     
  • For any failure after nvme_rdma_start_queue in
    nvme_rdma_configure_admin_queue, the admin queue will be freed with the
    NVME_RDMA_Q_LIVE flag still set. Once nvme_rdma_stop_queue is invoked,
    that will cause a use-after-free.
    BUG: KASAN: use-after-free in rdma_disconnect+0x1f/0xe0 [rdma_cm]

    To fix it, call nvme_rdma_stop_queue for all the failed cases after
    nvme_rdma_start_queue.

    Signed-off-by: Jianchao Wang
    Suggested-by: Sagi Grimberg
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Christoph Hellwig

    Jianchao Wang
     
  • The nvme timeout handling doesn't do anything if the pci channel is
    offline, which is the case when recovering from PCI error event, so it
    was a bad idea to sync the controller reset in this state. This patch
    flushes the reset work in the error_resume callback instead when the
    channel is back to online. This keeps AER handling serialized and
    can recover from timeouts.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=199757
    Fixes: cc1d5e749a2e ("nvme/pci: Sync controller reset for AER slot_reset")
    Reported-by: Alex Gagniuc
    Tested-by: Alex Gagniuc
    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig

    Keith Busch
     
  • Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will invoke
    free_irq for it and cause a 'Trying to free already-free IRQ xxx'
    warning if the create CQ/SQ command times out.

    Signed-off-by: Jianchao Wang
    Reviewed-by: Keith Busch
    [hch: fixed to pass a s16 and clean up the comment]
    Signed-off-by: Christoph Hellwig

    Jianchao Wang
     
  • Convert the S_ symbolic permissions to their octal equivalents as
    using octal and not symbolic permissions is preferred by many as more
    readable.

    see: https://lkml.org/lkml/2016/8/2/1945

    Done with automated conversion via:
    $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace

    Miscellanea:

    o Wrapped modified multi-line calls to a single line where appropriate
    o Realign modified multi-line calls to open parenthesis

    Signed-off-by: Joe Perches
    Signed-off-by: Jens Axboe

    Joe Perches
     

24 May, 2018

1 commit

  • For some reason we had discard granularity set to 512 always even when
    discards were disabled. Fix this by having the default be 0, and then
    if we turn it on set the discard granularity to the blocksize.

    Signed-off-by: Josef Bacik
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Josef Bacik
     

23 May, 2018

3 commits

  • Ternary operator have lower precedence then bitwise or, so 'cdw10' was
    calculated wrong.

    Signed-off-by: Ivan Bornyakov
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Keith Busch

    Ivan Bornyakov
     
  • When running blktest's nvme/005 with a lockdep enabled kernel the test
    case fails due to the following lockdep splat in dmesg:

    =============================
    WARNING: suspicious RCU usage
    4.17.0-rc5 #881 Not tainted
    -----------------------------
    drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    3 locks held by kworker/u32:5/1102:
    #0: (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0
    #1: (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0
    #2: (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core]

    The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove()
    which holds the subsys lock so it's likely a false positive, but when
    using rcu_access_pointer(), we're telling rcu and lockdep that we're
    only after the pointer falue.

    Fixes: 32acab3181c7 ("nvme: implement multipath access to nvme subsystems")
    Signed-off-by: Johannes Thumshirn
    Suggested-by: Paul E. McKenney
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Keith Busch

    Johannes Thumshirn
     
  • Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound
    to a single CPU that is selected at device creation time.

    Signed-off-by: Dan Melnic
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Dan Melnic
     

21 May, 2018

2 commits

  • If polling completions are racing with the IRQ triggered by a
    completion, the IRQ handler will find no work and return IRQ_NONE.
    This can trigger complaints about spurious interrupts:

    [ 560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
    [ 560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
    [ 560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
    [ 560.175991] Call Trace:
    [ 560.175994]
    [ 560.176005] dump_stack+0x5c/0x7b
    [ 560.176010] __report_bad_irq+0x30/0xc0
    [ 560.176013] note_interrupt+0x235/0x280
    [ 560.176020] handle_irq_event_percpu+0x51/0x70
    [ 560.176023] handle_irq_event+0x27/0x50
    [ 560.176026] handle_edge_irq+0x6d/0x180
    [ 560.176031] handle_irq+0xa5/0x110
    [ 560.176036] do_IRQ+0x41/0xc0
    [ 560.176042] common_interrupt+0xf/0xf
    [ 560.176043]
    [ 560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
    [ 560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
    [ 560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
    [ 560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
    [ 560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
    [ 560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
    [ 560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
    [ 560.176065] ? cpuidle_enter_state+0x8b/0x2b0
    [ 560.176071] do_idle+0x1f4/0x260
    [ 560.176075] cpu_startup_entry+0x6f/0x80
    [ 560.176080] start_secondary+0x184/0x1d0
    [ 560.176085] secondary_startup_64+0xa5/0xb0
    [ 560.176088] handlers:
    [ 560.178387] [] nvme_irq [nvme]
    [ 560.183019] Disabling IRQ #630

    A previous commit removed ->cqe_seen that was handling this case,
    but we need to handle this a bit differently due to completions
    now running outside the queue lock. Return IRQ_HANDLED from the
    IRQ handler, if the completion ring head was moved since we last
    saw it.

    Fixes: 5cb525c8315f ("nvme-pci: handle completions outside of the queue lock")
    Reported-by: Keith Busch
    Reviewed-by: Keith Busch
    Tested-by: Keith Busch
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Pull NVMe changes from Keith:

    "This is just the first nvme pull request for 4.18. There are several
    fabrics and target patches that I missed, so there will be more to
    come."

    * 'nvme-4.18' of git://git.infradead.org/nvme:
    nvme-pci: drop IRQ disabling on submission queue lock
    nvme-pci: split the nvme queue lock into submission and completion locks
    nvme-pci: handle completions outside of the queue lock
    nvme-pci: move ->cq_vector == -1 check outside of ->q_lock
    nvme-pci: remove cq check after submission
    nvme-pci: simplify nvme_cqe_valid
    nvme: mark the result argument to nvme_complete_async_event volatile
    nvme/pci: Sync controller reset for AER slot_reset
    nvme/pci: Hold controller reference during async probe
    nvme: only reconfigure discard if necessary
    nvme/pci: Use async_schedule for initial reset work
    nvme: lightnvm: add granby support
    NVMe: Add Quirk Delay before CHK RDY for Seagate Nytro Flash Storage
    nvme: change order of qid and cmdid in completion trace
    nvme: fc: provide a descriptive error

    Jens Axboe
     

19 May, 2018

2 commits