14 Mar, 2019

3 commits

  • [ Upstream commit 4726bcf30fad37cc555cd9dcd6c73f2b2668c879 ]

    The reset work holds a mutex to prevent races with removal modifying the
    same resources, but was unlocking only on success. Unlock on failure
    too.

    Fixes: 5c959d73dba64 ("nvme-pci: fix rapid add remove sequence")
    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Keith Busch
     
  • [ Upstream commit 5c959d73dba6495ec01d04c206ee679d61ccb2b0 ]

    A surprise removal may fail to tear down request queues if it is racing
    with the initial asynchronous probe. If that happens, the remove path
    won't see the queue resources to tear down, and the controller reset
    path may create a new request queue on a removed device, but will not
    be able to make forward progress, deadlocking the pci removal.

    Protect setting up non-blocking resources from a shutdown by holding the
    same mutex, and transition to the CONNECTING state after these resources
    are initialized so the probe path may see the dead controller state
    before dispatching new IO.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081
    Reported-by: Alex Gagniuc
    Signed-off-by: Keith Busch
    Tested-by: Alex Gagniuc
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Keith Busch
     
  • [ Upstream commit e7ad43c3eda6a1690c4c3c341f95dc1c6898da83 ]

    If a controller supports the NS Change Notification, the namespace
    scan_work is automatically triggered after attaching a new namespace.

    Occasionally the namespace scan_work may append the new namespace to the
    list before the admin command effects handling is completed. The effects
    handling unfreezes namespaces, but if it unfreezes the newly attached
    namespace, its request_queue freeze depth will be off and we'll hit the
    warning in blk_mq_unfreeze_queue().

    On the next namespace add, we will fail to freeze that queue due to the
    previous bad accounting and deadlock waiting for frozen.

    Fix that by preventing scan work from altering the namespace list while
    command effects handling needs to pair freeze with unfreeze.

    Reported-by: Wen Xiong
    Tested-by: Wen Xiong
    Signed-off-by: Keith Busch
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Keith Busch
     

06 Mar, 2019

2 commits

  • [ Upstream commit 78a61cd42a64f3587862b372a79e1d6aaf131fd7 ]

    Bit 6 in the ANACAP field is used to indicate that the ANA group ID
    doesn't change while the namespace is attached to the controller.
    There is an optimisation in the code to only allocate space
    for the ANA group header, as the namespace list won't change and
    hence would not need to be refreshed.
    However, this optimisation was never carried over to the actual
    workflow, which always assumes that the buffer is large enough
    to hold the ANA header _and_ the namespace list.
    So drop this optimisation and always allocate enough space.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Hannes Reinecke
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Hannes Reinecke
     
  • [ Upstream commit 4c174e6366746ae8d49f9cc409f728eebb7a9ac9 ]

    Currently, we have several problems with the timeout
    handler:
    1. If we timeout on the controller establishment flow, we will hang
    because we don't execute the error recovery (and we shouldn't because
    the create_ctrl flow needs to fail and cleanup on its own)
    2. We might also hang if we get a disconnet on a queue while the
    controller is already deleting. This racy flow can cause the controller
    disable/shutdown admin command to hang.

    We cannot complete a timed out request from the timeout handler without
    mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work).
    So we serialize it in the timeout handler and teardown io and admin
    queues to guarantee that no one races with us from completing the
    request.

    Reported-by: Jaesoo Lee
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Sagi Grimberg
     

20 Feb, 2019

4 commits

  • [ Upstream commit 3da584f57133e51aeb84aaefae5e3d69531a1e4f ]

    We need to preserve the leading zeros in the vid and ssvid when generating
    a unique NQN. Truncating these may lead to naming collisions.

    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Keith Busch
     
  • [ Upstream commit c7055fd15ff46d92eb0dd1c16a4fe010d58224c8 ]

    When nvme_init_identify() fails the ANA log buffer is deallocated
    but _not_ set to NULL. This can cause double free oops when this
    controller is deleted without ever being reconnected.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Hannes Reinecke
     
  • [ Upstream commit dcca1662727220d18fa351097ddff33f95f516c5 ]

    There is an out of bounds array access in nvme_cqe_peding().

    When enable irq_thread for nvme interrupt, there is racing between the
    nvmeq->cq_head updating and reading.

    nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head
    equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending()
    uses its value as an array index, the index will be out of bounds.

    Signed-off-by: Hongbo Yao
    [hch: slight coding style update]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Hongbo Yao
     
  • [ Upstream commit cc667f6d5de023ee131e96bb88e5cddca23272bd ]

    When using HMB the PCIe host driver allocates host_mem_desc_bufs using
    dma_alloc_attrs() but frees them using dma_free_coherent(). Use the
    correct dma_free_attrs() function to free the buffers.

    Signed-off-by: Liviu Dudau
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Liviu Dudau
     

31 Jan, 2019

2 commits

  • commit 5cbab6303b4791a3e6713dfe2c5fda6a867f9adc upstream.

    Under heavy load if we don't have any pre-allocated rsps left, we
    dynamically allocate a rsp, but we are not actually allocating memory
    for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
    fields (req->rsp->status) in nvmet_req_init() will result in crash.

    To fix this, allocate the memory for nvme_completion by calling
    nvmet_rdma_alloc_rsp()

    Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")

    Cc:
    Reviewed-by: Max Gurtovoy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Raju Rangoju
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Raju Rangoju
     
  • commit ad1f824948e4ed886529219cf7cd717d078c630d upstream.

    Signed-off-by: Israel Rukshin
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Cc: Raju Rangoju
    Signed-off-by: Greg Kroah-Hartman

    Israel Rukshin
     

21 Dec, 2018

2 commits

  • [ Upstream commit d7dcdf9d4e15189ecfda24cc87339a3425448d5c ]

    nvmet_rdma_release_rsp() may free the response before using it at error
    flow.

    Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load")
    Signed-off-by: Israel Rukshin
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Israel Rukshin
     
  • [ Upstream commit 86880d646122240596d6719b642fee3213239994 ]

    Delete operations are seeing NULL pointer references in call_timer_fn.
    Tracking these back, the timer appears to be the keep alive timer.

    nvme_keep_alive_work() which is tied to the timer that is cancelled
    by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
    wait for it's completion. So nvme_stop_keep_alive() only stops a timer
    when it's pending. When a keep alive is in flight, there is no timer
    running and the nvme_stop_keep_alive() will have no affect on the keep
    alive io. Thus, if the io completes successfully, the keep alive timer
    will be rescheduled. In the failure case, delete is called, the
    controller state is changed, the nvme_stop_keep_alive() is called while
    the io is outstanding, and the delete path continues on. The keep
    alive happens to successfully complete before the delete paths mark it
    as aborted as part of the queue termination, so the timer is restarted.
    The delete paths then tear down the controller, and later on the timer
    code fires and the timer entry is now corrupt.

    Fix by validating the controller state before rescheduling the keep
    alive. Testing with the fix has confirmed the condition above was hit.

    Signed-off-by: James Smart
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    James Smart
     

17 Dec, 2018

3 commits

  • [ Upstream commit 6344d02dc8f886b6bbcd922ae1a17e4a41500f2d ]

    Some error paths in configuration of admin queue free data buffer
    associated with async request SQE without resetting the data buffer
    pointer to NULL, This buffer is also freed up again if the controller
    is shutdown or reset.

    Signed-off-by: Prabhath Sajeepa
    Reviewed-by: Roland Dreier
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Prabhath Sajeepa
     
  • [ Upstream commit f6c8e432cb0479255322c5d0335b9f1699a0270c ]

    nvme_stop_ctrl can be called also for reset flow and there is no need to
    flush the scan_work as namespaces are not being removed. This can cause
    deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
    before controller teardown (and specifically I/O cancellation of the
    scan_work itself) takes place, but the scan_work will be blocked anyways
    so there is no need to flush it.

    Instead, move scan_work flush to nvme_remove_namespaces() where it really
    needs to flush.

    Reported-by: Ming Lei
    Signed-off-by: Sagi Grimberg
    Reviewed-by: Keith Busch
    Reviewed by: James Smart
    Tested-by: Ewan D. Milne
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    Sagi Grimberg
     
  • [ Upstream commit 14a1336e6fff47dd1028b484d6c802105c58e2ee ]

    Without CONFIG_NVME_MULTIPATH enabled a multi-port subsystem might
    show up as invididual devices and cause problems, warn about it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Sasha Levin

    Christoph Hellwig
     

13 Dec, 2018

1 commit

  • [ Upstream commit 4cff280a5fccf6513ed9e895bb3a4e7ad8b0cedc ]

    If an io error occurs on an io issued while connecting, recovery
    of the io falls flat as the state checking ends up nooping the error
    handler.

    Create an err_work work item that is scheduled upon an io error while
    connecting. The work thread terminates all io on all queues and marks
    the queues as not connected. The termination of the io will return
    back to the callee, which will then back out of the connection attempt
    and will reschedule, if possible, the connection attempt.

    The changes:
    - in case there are several commands hitting the error handler, a
    state flag is kept so that the error work is only scheduled once,
    on the first error. The subsequent errors can be ignored.
    - The calling sequence to stop keep alive and terminate the queues
    and their io is lifted from the reset routine. Made a small
    service routine used by both reset and err_work.
    - During debugging, found that the teardown path can reference
    an uninitialized pointer, resulting in a NULL pointer oops.
    The aen_ops weren't initialized yet. Add validation on their
    initialization before calling the teardown routine.

    Signed-off-by: James Smart
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin

    James Smart
     

27 Nov, 2018

1 commit

  • [ Upstream commit 8f676b8508c250bbe255096522fdefb73f1ea0b9 ]

    Whenever we update ns_head info, we need to make sure it is still
    compatible with all underlying backing devices because although nvme
    multipath doesn't have any explicit use of these limits, other devices
    can still be stacked on top of it which may rely on the underlying limits.
    Start with unlimited stacking limits, and every info update iterate over
    siblings and adjust queue limits.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Sagi Grimberg
     

14 Nov, 2018

1 commit

  • [ Upstream commit 783f4a4408e1251d17f333ad56abac24dde988b9 ]

    When an io is rejected by nvmf_check_ready() due to validation of the
    controller state, the nvmf_fail_nonready_command() will normally return
    BLK_STS_RESOURCE to requeue and retry. However, if the controller is
    dying or the I/O is marked for NVMe multipath, the I/O is failed so that
    the controller can terminate or so that the io can be issued on a
    different path. Unfortunately, as this reject point is before the
    transport has accepted the command, blk-mq ends up completing the I/O
    and never calls nvme_complete_rq(), which is where multipath may preserve
    or re-route the I/O. The end result is, the device user ends up seeing an
    EIO error.

    Example: single path connectivity, controller is under load, and a reset
    is induced. An I/O is received:

    a) while the reset state has been set but the queues have yet to be
    stopped; or
    b) after queues are started (at end of reset) but before the reconnect
    has completed.

    The I/O finishes with an EIO status.

    This patch makes the following changes:

    - Adds the HOST_PATH_ERROR pathing status from TP4028
    - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
    - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.

    Signed-off-by: James Smart
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    James Smart
     

08 Oct, 2018

1 commit

  • The code had been clearing a namespace being deleted as the current
    path while that namespace was still in the path siblings list. It is
    possible a new IO could set that namespace back to the current path
    since it appeared to be an eligable path to select, which may result in
    a use-after-free error.

    This patch ensures a namespace being removed is not eligable to be reset
    as a current path prior to clearing it as the current path.

    Signed-off-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    Keith Busch
     

26 Sep, 2018

1 commit


17 Sep, 2018

1 commit


06 Sep, 2018

1 commit

  • Currently we always repost the recv buffer before we send a response
    capsule back to the host. Since ordering is not guaranteed for send
    and recv completions, it is posible that we will receive a new request
    from the host before we got a send completion for the response capsule.

    Today, we pre-allocate 2x rsps the length of the queue, but in reality,
    under heavy load there is nothing that is really preventing the gap to
    expand until we exhaust all our rsps.

    To fix this, if we don't have any pre-allocated rsps left, we dynamically
    allocate a rsp and make sure to free it when we are done. If under memory
    pressure we fail to allocate a rsp, we silently drop the command and
    wait for the host to retry.

    Reported-by: Steve Wise
    Tested-by: Steve Wise
    Signed-off-by: Sagi Grimberg
    [hch: dropped a superflous assignment]
    Signed-off-by: Christoph Hellwig

    Sagi Grimberg
     

28 Aug, 2018

3 commits

  • Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     
  • When a targetport is removed from the config, fcloop will avoid calling
    the LS done() routine thinking the targetport is gone. This leaves the
    initiator reset/reconnect hanging as it waits for a status on the
    Create_Association LS for the reconnect.

    Change the filter in the LS callback path. If tport null (set when
    failed validation before "sending to remote port"), be sure to call
    done. This was the main bug. But, continue the logic that only calls
    done if tport was set but there is no remoteport (e.g. case where
    remoteport has been removed, thus host doesn't expect a completion).

    Signed-off-by: James Smart
    Signed-off-by: Christoph Hellwig

    James Smart
     
  • In many architectures loads may be reordered with older stores to
    different locations. In the nvme driver the following two operations
    could be reordered:

    - Write shadow doorbell (dbbuf_db) into memory.
    - Read EventIdx (dbbuf_ei) from memory.

    This can result in a potential race condition between driver and VM host
    processing requests (if given virtual NVMe controller has a support for
    shadow doorbell). If that occurs, then the NVMe controller may decide to
    wait for MMIO doorbell from guest operating system, and guest driver may
    decide not to issue MMIO doorbell on any of subsequent commands.

    This issue is purely timing-dependent one, so there is no easy way to
    reproduce it. Currently the easiest known approach is to run "Oracle IO
    Numbers" (orion) that is shipped with Oracle DB:

    orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
    concat -write 40 -duration 120 -matrix row -testname nvme_test

    Where nvme_test is a .lun file that contains a list of NVMe block
    devices to run test against. Limiting number of vCPUs assigned to given
    VM instance seems to increase chances for this bug to occur. On test
    environment with VM that got 4 NVMe drives and 1 vCPU assigned the
    virtual NVMe controller hang could be observed within 10-20 minutes.
    That correspond to about 400-500k IO operations processed (or about
    100GB of IO read/writes).

    Orion tool was used as a validation and set to run in a loop for 36
    hours (equivalent of pushing 550M IO operations). No issues were
    observed. That suggest that the patch fixes the issue.

    Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
    Signed-off-by: Michal Wnukowski
    Reviewed-by: Keith Busch
    Reviewed-by: Sagi Grimberg
    [hch: updated changelog and comment a bit]
    Signed-off-by: Christoph Hellwig

    Michal Wnukowski
     

17 Aug, 2018

2 commits

  • rdma.git merge resolution for the 4.19 merge window

    Conflicts:
    drivers/infiniband/core/rdma_core.c
    - Use the rdma code and revise with the new spelling for
    atomic_fetch_add_unless
    drivers/nvme/host/rdma.c
    - Replace max_sge with max_send_sge in new blk code
    drivers/nvme/target/rdma.c
    - Use the blk code and revise to use NULL for ib_post_recv when
    appropriate
    - Replace max_sge with max_recv_sge in new blk code
    net/rds/ib_send.c
    - Use the net code and revise to use NULL for ib_post_recv when
    appropriate

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Resolve merge conflicts from the -rc cycle against the rdma.git tree:

    Conflicts:
    drivers/infiniband/core/uverbs_cmd.c
    - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
    - Merge removal of file->ucontext in for-next with new code in -rc
    drivers/infiniband/core/uverbs_main.c
    - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

15 Aug, 2018

1 commit

  • Pull block updates from Jens Axboe:
    "First pull request for this merge window, there will also be a
    followup request with some stragglers.

    This pull request contains:

    - Fix for a thundering heard issue in the wbt block code (Anchal
    Agarwal)

    - A few NVMe pull requests:
    * Improved tracepoints (Keith)
    * Larger inline data support for RDMA (Steve Wise)
    * RDMA setup/teardown fixes (Sagi)
    * Effects log suppor for NVMe target (Chaitanya Kulkarni)
    * Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
    * TP4004 (ANA) support (Christoph)
    * Various NVMe fixes

    - Block io-latency controller support. Much needed support for
    properly containing block devices. (Josef)

    - Series improving how we handle sense information on the stack
    (Kees)

    - Lightnvm fixes and updates/improvements (Mathias/Javier et al)

    - Zoned device support for null_blk (Matias)

    - AIX partition fixes (Mauricio Faria de Oliveira)

    - DIF checksum code made generic (Max Gurtovoy)

    - Add support for discard in iostats (Michael Callahan / Tejun)

    - Set of updates for BFQ (Paolo)

    - Removal of async write support for bsg (Christoph)

    - Bio page dirtying and clone fixups (Christoph)

    - Set of bcache fix/changes (via Coly)

    - Series improving blk-mq queue setup/teardown speed (Ming)

    - Series improving merging performance on blk-mq (Ming)

    - Lots of other fixes and cleanups from a slew of folks"

    * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
    blkcg: Make blkg_root_lookup() work for queues in bypass mode
    bcache: fix error setting writeback_rate through sysfs interface
    null_blk: add lock drop/acquire annotation
    Blk-throttle: reduce tail io latency when iops limit is enforced
    block: paride: pd: mark expected switch fall-throughs
    block: Ensure that a request queue is dissociated from the cgroup controller
    block: Introduce blk_exit_queue()
    blkcg: Introduce blkg_root_lookup()
    block: Remove two superfluous #include directives
    blk-mq: count the hctx as active before allocating tag
    block: bvec_nr_vecs() returns value for wrong slab
    bcache: trivial - remove tailing backslash in macro BTREE_FLAG
    bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
    bcache: set max writeback rate when I/O request is idle
    bcache: add code comments for bset.c
    bcache: fix mistaken comments in request.c
    bcache: fix mistaken code comments in bcache.h
    bcache: add a comment in super.c
    bcache: avoid unncessary cache prefetch bch_btree_node_get()
    bcache: display rate debug parameters to 0 when writeback is not running
    ...

    Linus Torvalds
     

08 Aug, 2018

3 commits

  • When the user supplies a ctrl_loss_tmo < 0, we warn them that this will
    cause the fabrics layer to attempt reconnection forever. However, in
    reality the fabrics layer never attempts to reconnect because the
    condition to test whether we should reconnect is backwards in this case.

    Signed-off-by: Tal Shorer
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig

    Tal Shorer
     
  • This patch implements the Namespace Write Protect feature described in
    "NVMe TP 4005a Namespace Write Protect". In this version, we implement
    No Write Protect and Write Protect states for target ns which can be
    toggled by set-features commands from the host side.

    For write-protect state transition, we need to flush the ns specified
    as a part of command so we also add helpers for carrying out synchronous
    flush operations.

    Signed-off-by: Chaitanya Kulkarni
    [hch: fixed an incorrect endianess conversion, minor cleanups]
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     
  • NVMe 1.3 TP 4005 introduces new filed (NSATTR). This field indicates
    whether given namespace is write protected or not. This patch sets the
    gendisk associated with the namespace to read only based on the identify
    namespace nsattr field.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni
     

07 Aug, 2018

1 commit


06 Aug, 2018

3 commits

  • A minor version number increase should not break backwards
    compatibility.

    Fixes: 3cb98f84d368b ("lightnvm: add minor version to generic geometry")
    Reviewed-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Pull NVMe changes from Christoph:

    "This contains the support for TP4004, Asymmetric Namespace Access,
    which makes NVMe multipathing usable in practice."

    * 'nvme-4.19' of git://git.infradead.org/nvme:
    nvmet: use Retain Async Event bit to clear AEN
    nvmet: support configuring ANA groups
    nvmet: add minimal ANA support
    nvmet: track and limit the number of namespaces per subsystem
    nvmet: keep a port pointer in nvmet_ctrl
    nvme: add ANA support
    nvme: remove nvme_req_needs_failover
    nvme: simplify the API for getting log pages
    nvme.h: add ANA definitions
    nvme.h: add support for the log specific field

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a
    merge conflict down the line.

    Signed-of-by: Jens Axboe

    Jens Axboe
     

30 Jul, 2018

2 commits

  • Also moved the logic of the remapping to the nvme core driver instead
    of implementing it in the nvme pci driver. This way all the other nvme
    transport drivers will benefit from it (in case they'll implement metadata
    support).

    Suggested-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Acked-by: Keith Busch
    Signed-off-by: Max Gurtovoy
    Signed-off-by: Jens Axboe

    Max Gurtovoy
     
  • Currently this function is implemented in the scsi layer, but it's
    actual place should be the block layer since T10-PI is a general
    data integrity feature that is used in the nvme protocol as well.

    Suggested-by: Christoph Hellwig
    Cc: Martin K. Petersen
    Signed-off-by: Max Gurtovoy
    Signed-off-by: Jens Axboe

    Max Gurtovoy
     

28 Jul, 2018

2 commits

  • Pull block fixes from Jens Axboe:
    "Bigger than usual at this time, mostly due to the O_DIRECT corruption
    issue and the fact that I was on vacation last week. This contains:

    - NVMe pull request with two fixes for the FC code, and two target
    fixes (Christoph)

    - a DIF bio reset iteration fix (Greg Edwards)

    - two nbd reply and requeue fixes (Josef)

    - SCSI timeout fixup (Keith)

    - a small series that fixes an issue with bio_iov_iter_get_pages(),
    which ended up causing corruption for larger sized O_DIRECT writes
    that ended up racing with buffered writes (Martin Wilck)"

    * tag 'for-linus-20180727' of git://git.kernel.dk/linux-block:
    block: reset bi_iter.bi_done after splitting bio
    block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
    blkdev: __blkdev_direct_IO_simple: fix leak in error case
    block: bio_iov_iter_get_pages: fix size of last iovec
    nvmet: only check for filebacking on -ENOTBLK
    nvmet: fixup crash on NULL device path
    scsi: set timed out out mq requests to complete
    blk-mq: export setting request completion state
    nvme: if_ready checks to fail io to deleting controller
    nvmet-fc: fix target sgl list on large transfers
    nbd: handle unexpected replies better
    nbd: don't requeue the same request twice.

    Linus Torvalds
     
  • In the current implementation, we clear the AEN bit when we get the
    "get log page" command if given log page is associated with AEN.
    This patch allows optionally retaining the AEN for the ctrl
    under consideration when Retain Asynchronous Event (RAE) bit is set
    as a part of "get log page" command.

    This allows the host to read the Log page and optionally retaining the
    AEN associated with this log page when using userspace tools like
    nvme-cli.

    Signed-off-by: Chaitanya Kulkarni
    [hch: also use the new helper in the just merged ANA code]
    Signed-off-by: Christoph Hellwig

    Chaitanya Kulkarni