13 May, 2017

1 commit

  • Pull SCSI target updates from Nicholas Bellinger:
    "Things were a lot more calm than previously expected. It's primarily
    fixes in various areas, with most of the new functionality centering
    around TCMU backend driver work that Xiubo Li has been driving.

    Here's the summary on the feature side:

    - Make T10-PI verify configurable for emulated (FILEIO + RD) backends
    (Dmitry Monakhov)
    - Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
    (Bryant Ly + MNC)
    - Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
    - Add TCMU support for global block data pool (Xiubo Li + MNC)

    and on the bug-fix side:

    - Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
    failures (Gary Guo + nab)
    - Fix iscsi-target hang with explicitly changing per NodeACL
    CmdSN number depth with concurrent login driven session
    reinstatement. (Gary Guo + nab)
    - Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
    - Fix target-core/FILEIO zero length handling (Bart Van Assche)

    Also, there was an OOPs introduced with the WRITE_VERIFY changes that
    I ended up reverting at the last minute, because as not unusual Bart
    and I could not agree on the fix in time for -rc1. Since it's specific
    to a conformance test, it's been reverted for now.

    There is a separate patch in the queue to address the underlying
    control CDB write overflow regression in >= v4.3 separate from the
    WRITE_VERIFY revert here, that will be pushed post -rc1"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
    Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
    IB/srpt: Avoid that aborting a command triggers a kernel warning
    IB/srpt: Fix abort handling
    target/fileio: Fix zero-length READ and WRITE handling
    ibmvscsis: Do not send aborted task response
    tcmu: fix module removal due to stuck thread
    target: Don't force session reset if queue_depth does not change
    iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
    target: Fix compare_and_write_callback handling for non GOOD status
    tcmu: Recalculate the tcmu_cmd size to save cmd area memories
    tcmu: Add global data block pool support
    tcmu: Add dynamic growing data area feature support
    target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
    target: fixup error message in target_tg_pt_gp_alua_access_type_store()
    target/user: PGR Support
    target: Add WRITE_VERIFY_16
    Documentation/target: add an example script to configure an iSCSI target
    target: Use kmalloc_array() in transport_kmap_data_sg()
    target: Use kmalloc_array() in compare_and_write_callback()
    target: Improve size determinations in two functions
    ...

    Linus Torvalds
     

11 May, 2017

1 commit

  • This reverts commit 0e2eb7d12eaa8e391bf5615d4271bb87a649caaa

    Author: Bart Van Assche
    Date: Thu Mar 30 10:12:39 2017 -0700

    target: Fix VERIFY and WRITE VERIFY command parsing

    This patch broke existing behaviour for WRITE_VERIFY because
    it dropped the original SCF_SCSI_DATA_CDB assignment for
    bytchk = 0 so target_cmd_size_check() no longer rejected
    this case, allowing an overflow case to trigger an OOPs
    in iscsi-target.

    Since the short term and long term fixes are still being
    discussed, revert it for now since it's late in the merge
    window and try again in v4.13-rc1.

    Conflicts:
    drivers/target/target_core_sbc.c

    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     

08 May, 2017

1 commit

  • This patch fixes zero-length READ and WRITE handling in target/FILEIO,
    which was broken a long time back by:

    Since:

    commit d81cb44726f050d7cf1be4afd9cb45d153b52066
    Author: Paolo Bonzini
    Date: Mon Sep 17 16:36:11 2012 -0700

    target: go through normal processing for all zero-length commands

    which moved zero-length READ and WRITE completion out of target-core,
    to doing submission into backend driver code.

    To address this, go ahead and invoke target_complete_cmd() for any
    non negative return value in fd_do_rw().

    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Cc: Andy Grover
    Cc: David Disseldorp
    Cc: # v3.7+
    Signed-off-by: Nicholas Bellinger

    Bart Van Assche
     

05 May, 2017

4 commits

  • We need to do a kthread_should_stop to check when kthread_stop has been
    called.

    This was a regression added in

    b6df4b79a5514a9c6c53533436704129ef45bf76
    tcmu: Add global data block pool support

    so not sure if you wanted to merge it in with that patch or what.

    Signed-off-by: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Mike Christie
     
  • Keeping in the idempotent nature of target_core_fabric_configfs.c,
    if a queue_depth value is set and it's the same as the existing
    value, don't attempt to force session reinstatement.

    Reported-by: Raghu Krishnamurthy
    Cc: Raghu Krishnamurthy
    Tested-by: Gary Guo
    Cc: Gary Guo
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • While testing modification of per se_node_acl queue_depth forcing
    session reinstatement via lio_target_nacl_cmdsn_depth_store() ->
    core_tpg_set_initiator_node_queue_depth(), a hung task bug triggered
    when changing cmdsn_depth invoked session reinstatement while an iscsi
    login was already waiting for session reinstatement to complete.

    This can happen when an outstanding se_cmd descriptor is taking a
    long time to complete, and session reinstatement from iscsi login
    or cmdsn_depth change occurs concurrently.

    To address this bug, explicitly set session_fall_back_to_erl0 = 1
    when forcing session reinstatement, so session reinstatement is
    not attempted if an active session is already being shutdown.

    This patch has been tested with two scenarios. The first when
    iscsi login is blocked waiting for iscsi session reinstatement
    to complete followed by queue_depth change via configfs, and
    second when queue_depth change via configfs us blocked followed
    by a iscsi login driven session reinstatement.

    Note this patch depends on commit d36ad77f702 to handle multiple
    sessions per se_node_acl when changing cmdsn_depth, and for
    pre v4.5 kernels will need to be included for stable as well.

    Reported-by: Gary Guo
    Tested-by: Gary Guo
    Cc: Gary Guo
    Cc: # v4.1+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Following the bugfix for handling non SAM_STAT_GOOD COMPARE_AND_WRITE
    status during COMMIT phase in commit 9b2792c3da1, the same bug exists
    for the READ phase as well.

    This would manifest first as a lost SCSI response, and eventual
    hung task during fabric driver logout or re-login, as existing
    shutdown logic waited for the COMPARE_AND_WRITE se_cmd->cmd_kref
    to reach zero.

    To address this bug, compare_and_write_callback() has been changed
    to set post_ret = 1 and return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE
    as necessary to signal failure status.

    Reported-by: Bill Borsari
    Cc: Bill Borsari
    Tested-by: Gary Guo
    Cc: Gary Guo
    Cc: # v4.1+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     

03 May, 2017

1 commit

  • For the "struct tcmu_cmd_entry" in cmd area, the minimum size
    will be sizeof(struct tcmu_cmd_entry) == 112 Bytes. And it could
    fill about (sizeof(struct rsp) - sizeof(struct req)) /
    sizeof(struct iovec) == 68 / 16 ~= 4 data regions(iov[4]) by
    default.

    For most tcmu_cmds, the data block indexes allocated from the
    data area will be continuous. And for the continuous blocks they
    will be merged into the same region using only one iovec. For
    the current code, it will always allocates the same number of
    iovecs with blocks for each tcmu_cmd, and it will wastes much
    memories.

    For example, when the block size is 4K and the DATA_OUT buffer
    size is 64K, and the regions needed is less than 5(on my
    environment is almost 99.7%). The current code will allocate
    about 16 iovecs, and there will be (16 - 4) * sizeof(struct
    iovec) = 192 Bytes cmd area memories wasted.

    Here adds two helpers to calculate the base size and full size
    of the tcmu_cmd. And will recalculate them again when it make sure
    how many iovs is needed before insert it to cmd area.

    Signed-off-by: Xiubo Li
    Acked-by: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Xiubo Li
     

02 May, 2017

20 commits

  • For each target there will be one ring, when the target number
    grows larger and larger, it could eventually runs out of the
    system memories.

    In this patch for each target ring, currently for the cmd area
    the size will be fixed to 8MB and for the data area the size
    will grow from 0 to max 256K * PAGE_SIZE(1G for 4K page size).

    For all the targets' data areas, they will get empty blocks
    from the "global data block pool", which has limited to 512K *
    PAGE_SIZE(2G for 4K page size) for now.

    When the "global data block pool" has been used up, then any
    target could wake up the unmap thread routine to shrink other
    targets' data area memories. And the unmap thread routine will
    always try to truncate the ring vma from the last using block
    offset.

    When user space has touched the data blocks out of tcmu_cmd
    iov[], the tcmu_page_fault() will try to return one zeroed blocks.

    Here we move the timeout's tcmu_handle_completions() into unmap
    thread routine, that's to say when the timeout fired, it will
    only do the tcmu_check_expired_cmd() and then wake up the unmap
    thread to do the completions() and then try to shrink its idle
    memories. Then the cmdr_lock could be a mutex and could simplify
    this patch because the unmap_mapping_range() or zap_* may go to
    sleep.

    Signed-off-by: Xiubo Li
    Signed-off-by: Jianfei Hu
    Acked-by: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Xiubo Li
     
  • Currently for the TCMU, the ring buffer size is fixed to 64K cmd
    area + 1M data area, and this will be bottlenecks for high iops.

    The struct tcmu_cmd_entry {} size is fixed about 112 bytes with
    iovec[N] & N < N 4, the sizeof(cmd entry) will be [(N - 4) *16 + 112] bytes,
    and its corresponding data size will be [N * 4096], so the ratio
    of sizeof(cmd entry) : sizeof(datas) == [(N - 4) * 16 + 112)Bytes
    : (N * 4096)Bytes == 4/1024 - 12/(N * 1024), so the max is about
    4 : 1024.

    When N is bigger, the ratio will be smaller.

    As the initial patch, we will set the cmd area size to 2M, and
    the cmd area size to 32M. The TCMU will dynamically grows the data
    area from 0 to max 32M size as needed.

    The cmd area memory will be allocated through vmalloc(), and the
    data area's blocks will be allocated individually later when needed.

    The allocated data area block memory will be managed via radix tree.
    For now the bitmap still be the most efficient way to search and
    manage the block index, this could be update later.

    Signed-off-by: Xiubo Li
    Signed-off-by: Jianfei Hu
    Acked-by: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Xiubo Li
     
  • When setting up an ALUA target port group with an invalid ID the
    error message

    kstrtoul() returned -22 for tg_pt_gp_id

    is displayed, which is not really helpful.
    Convert it to something sane.
    And while we're at it, join the messages onto a single line.

    Signed-by: Hannes Reinecke
    Reviewed-by: Bart van Assche
    Signed-off-by: Nicholas Bellinger

    Hannes Reinecke
     
  • When setting up a target the error message:

    Unable to do set ##_name ALUA state on non valid tg_pt_gp ID: 0

    is displayed.
    Apparently concatenation doesn't work in a string; one should be using
    implicit string concatenation here.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Bart van Assche
    Signed-off-by: Nicholas Bellinger

    Hannes Reinecke
     
  • This adds initial PGR support for just TCMU, since tcmu doesn't
    have the necessary IT_NEXUS info to process PGR in userspace,
    so have those commands be processed in kernel.

    HA support is not available yet, we will work on it if this patch
    is acceptable.

    Signed-off-by: Bryant G. Ly
    Signed-off-by: Nicholas Bellinger

    Bryant G. Ly
     
  • This patch addresses clients who needs write_verify_16 for
    large volume groups such as AIX.

    Signed-off-by: Bryant G. Ly
    Signed-off-by: Nicholas Bellinger

    Bryant G. Ly
     
  • A multiplication for the size determination of a memory allocation
    indicated that an array data structure should be processed.
    Thus use the corresponding function "kmalloc_array".

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • * A multiplication for the size determination of a memory allocation
    indicated that an array data structure should be processed.
    Thus use the corresponding function "kmalloc_array".

    This issue was detected by using the Coccinelle software.

    * Replace the specification of a data structure by a pointer dereference
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • Replace the specification of two data structures by pointer dereferences
    as the parameter for the operator "sizeof" to make the corresponding size
    determinations a bit safer according to the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • The script "checkpatch.pl" pointed information out like the following.

    WARNING: Possible unnecessary 'out of memory' message

    Thus remove such statements here.

    Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • * Multiplications for the size determination of memory allocations
    indicated that array data structures should be processed.
    Thus use the corresponding function "kcalloc".

    This issue was detected by using the Coccinelle software.

    * Replace the specification of data structures by pointer dereferences
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • Replace the specification of four data structures by pointer dereferences
    as the parameter for the operator "sizeof" to make the corresponding size
    determinations a bit safer according to the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • The script "checkpatch.pl" pointed information out like the following.

    WARNING: Possible unnecessary 'out of memory' message

    Thus remove such statements here.

    Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • * A multiplication for the size determination of a memory allocation
    indicated that an array data structure should be processed.
    Thus use the corresponding function "kcalloc".

    This issue was detected by using the Coccinelle software.

    * Replace the specification of a data structure by a pointer dereference
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: Nicholas Bellinger

    Markus Elfring
     
  • Currently ramdisk and fileio always perform PI verification
    before and after backend IO. This approach is not very flexible.
    Because some one may want to postpone this work to other layers in
    IO stack. For example if we want to test blk_integrity_profile

    testcase:
    https://github.com/dmonakhov/xfstests/commit/dee408c868861d6b6871dbb3381facee7effdbe4
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Nicholas Bellinger

    Dmitry Monakhov
     
  • If we failed to read data from backing file (probably because some one
    truncate file under us), we must zerofill cmd's data, otherwise it will
    be returned as is. Most likely cmd's data are unitialized pages from
    page cache. This result in information leak.

    (Change BUG_ON into -EINVAL se_cmd failure - nab)

    testcase: https://github.com/dmonakhov/xfstests/commit/e11a1b7b907ca67b1be51a1594025600767366d5
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Nicholas Bellinger

    Dmitry Monakhov
     
  • Use the value of the BYTCHK field to determine the size of the
    Data-Out buffer. For VERIFY, honor the VRPROTECT, DPO and FUA
    fields. This patch avoids that LIO complains about a mismatch
    between the expected transfer length and the SCSI CDB length
    if the value of the BYTCHK field is 0.

    Signed-off-by: Bart Van Assche
    Cc: Max Lohrmann
    Cc:
    Signed-off-by: Nicholas Bellinger

    Bart Van Assche
     
  • This commit updated persistent revervation out service action
    code table in SPC-5 for development.

    Signed-off-by: Zhu Lingshan
    Signed-off-by: Nicholas Bellinger

    Zhu Lingshan
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Nicholas Bellinger

    Elena Reshetova
     
  • Pull block layer updates from Jens Axboe:

    - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
    was initially a fork of CFQ, but subsequently changed to implement
    fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
    to be used on desktop type single drives, providing good fairness.
    From Paolo.

    - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
    using a scalable token based algorithm that throttles IO based on
    live completion IO stats, similary to blk-wbt. From Omar.

    - A series from Jan, moving users to separately allocated backing
    devices. This continues the work of separating backing device life
    times, solving various problems with hot removal.

    - A series of updates for lightnvm, mostly from Javier. Includes a
    'pblk' target that exposes an open channel SSD as a physical block
    device.

    - A series of fixes and improvements for nbd from Josef.

    - A series from Omar, removing queue sharing between devices on mostly
    legacy drivers. This helps us clean up other bits, if we know that a
    queue only has a single device backing. This has been overdue for
    more than a decade.

    - Fixes for the blk-stats, and improvements to unify the stats and user
    windows. This both improves blk-wbt, and enables other users to
    register a need to receive IO stats for a device. From Omar.

    - blk-throttle improvements from Shaohua. This provides a scalable
    framework for implementing scalable priotization - particularly for
    blk-mq, but applicable to any type of block device. The interface is
    marked experimental for now.

    - Bucketized IO stats for IO polling from Stephen Bates. This improves
    efficiency of polled workloads in the presence of mixed block size
    IO.

    - A few fixes for opal, from Scott.

    - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
    From a variety of folks, mostly Sagi and James Smart.

    - A series from Bart, improving our exposed info and capabilities from
    the blk-mq debugfs support.

    - A series from Christoph, cleaning up how handle WRITE_ZEROES.

    - A series from Christoph, cleaning up the block layer handling of how
    we track errors in a request. On top of being a nice cleanup, it also
    shrinks the size of struct request a bit.

    - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
    never used by platforms, and the latter has outlived it's usefulness.

    - Various little bug fixes and cleanups from a wide variety of folks.

    * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
    block: hide badblocks attribute by default
    blk-mq: unify hctx delay_work and run_work
    block: add kblock_mod_delayed_work_on()
    blk-mq: unify hctx delayed_run_work and run_work
    nbd: fix use after free on module unload
    MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
    blk-mq-sched: alloate reserved tags out of normal pool
    mtip32xx: use runtime tag to initialize command header
    scsi: Implement blk_mq_ops.show_rq()
    blk-mq: Add blk_mq_ops.show_rq()
    blk-mq: Show operation, cmd_flags and rq_flags names
    blk-mq: Make blk_flags_show() callers append a newline character
    blk-mq: Move the "state" debugfs attribute one level down
    blk-mq: Unregister debugfs attributes earlier
    blk-mq: Only unregister hctxs for which registration succeeded
    blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
    blk-mq: Let blk_mq_debugfs_register() look up the queue name
    blk-mq: Register /queue/mq after having registered /queue
    ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
    ide-pm: always pass 0 error to __blk_end_request_all
    ..

    Linus Torvalds
     

21 Apr, 2017

1 commit

  • This passes on the scsi_cmnd result field to users of passthrough
    requests. Currently we abuse req->errors for this purpose, but that
    field will go away in its current form.

    Note that the old IDE code abuses the errors field in very creative
    ways and stores all kinds of different values in it. I didn't dare
    to touch this magic, so the abuses are brought forward 1:1.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

09 Apr, 2017

1 commit


06 Apr, 2017

1 commit


04 Apr, 2017

1 commit


03 Apr, 2017

2 commits

  • For the bidirectional case, the Data-Out buffer blocks will always at
    the head of the tcmu_cmd's bitmap, and before gathering the Data-In
    buffer, first of all it should skip the Data-Out ones, or the device
    supporting BIDI commands won't work.

    Fixed: 26418649eead ("target/user: Introduce data_bitmap, replace
    data_length/data_head/data_tail")
    Reported-by: Ilias Tsitsimpis
    Tested-by: Ilias Tsitsimpis
    Signed-off-by: Xiubo Li
    Cc: stable@vger.kernel.org # 4.6+
    Signed-off-by: Nicholas Bellinger

    Xiubo Li
     
  • Once upon a time back in 2009, a work-around was added to support
    the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login
    did not propose nor respond to MaxBurstLength, FirstBurstLength,
    DefaultTime2Wait and DefaultTime2Retain keys.

    The work-around in iscsi_check_proposer_for_optional_reply()
    allowed the missing keys to be proposed, but did not require
    waiting for a response before moving to full feature phase
    operation. This allowed GlobalSAN v3.3 to work out-of-the
    box, and for many years we didn't run into login interopt
    issues with any other initiators..

    Until recently, when Martin tried a QLogic 57840S iSCSI Offload
    HBA on Windows 2016 which completed login, but subsequently
    failed with:

    Got unknown iSCSI OpCode: 0x43

    The issue was QLogic MSFT side did not propose DefaultTime2Wait +
    DefaultTime2Retain, so LIO proposes them itself, and immediately
    transitions to full feature phase because of the GlobalSAN hack.
    However, the QLogic MSFT side still attempts to respond to
    DefaultTime2Retain + DefaultTime2Wait, even though LIO has set
    ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT
    in last login response.

    So while the QLogic MSFT side should have been proposing these
    two keys to start, it was doing the correct thing per RFC-3720
    attempting to respond to proposed keys before transitioning to
    full feature phase.

    All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541)
    does correctly propose the four keys during login, making the
    original work-around moot.

    So in order to allow QLogic MSFT to run unmodified as-is, go
    ahead and drop this long standing work-around.

    Reported-by: Martin Svec
    Cc: Martin Svec
    Cc: Himanshu Madhani
    Cc: Arun Easi
    Cc: stable@vger.kernel.org # 3.1+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     

31 Mar, 2017

3 commits

  • Multiple threads could be writing to alua_access_state at
    the same time, or there could be multiple STPGs in flight
    (different initiators sending them or one initiator sending
    them to different ports), or a combo of both and the
    core_alua_do_transition_tg_pt calls will race with each other.

    Because from the last patches we no longer delay running
    core_alua_do_transition_tg_pt_work, there does not seem to be
    any point in running that in a workqueue. And, we always
    wait for it to complete one way or another, so we can sleep
    in this code path. So, this patch made over target-pending just adds a
    mutex and does the work core_alua_do_transition_tg_pt_work was doing in
    core_alua_do_transition_tg_pt.

    There is also no need to use an atomic for the
    tg_pt_gp_alua_access_state. In core_alua_do_transition_tg_pt we will
    test and set it under the transition mutex. And, it is a int/32 bits
    so in the other places where it is read, we will never see it partially
    updated.

    Signed-off-by: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Mike Christie
     
  • This patch changes iscsi-target to propagate iscsit_transport
    ->iscsit_queue_data_in() and ->iscsit_queue_status() callback
    errors, back up into target-core.

    This allows target-core to retry failed iscsit_transport
    callbacks using internal queue-full logic.

    Reported-by: Potnuri Bharat Teja
    Reviewed-by: Potnuri Bharat Teja
    Tested-by: Potnuri Bharat Teja
    Cc: Potnuri Bharat Teja
    Reported-by: Steve Wise
    Cc: Steve Wise
    Cc: Sagi Grimberg
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch fixes a set of queue-full response handling
    bugs, where outgoing responses are leaked when a fabric
    driver is propagating non -EAGAIN or -ENOMEM errors
    to target-core.

    It introduces TRANSPORT_COMPLETE_QF_ERR state used to
    signal when CHECK_CONDITION status should be generated,
    when fabric driver ->write_pending(), ->queue_data_in(),
    or ->queue_status() callbacks fail with non -EAGAIN or
    -ENOMEM errors, and data-transfer should not be retried.

    Note all fabric driver -EAGAIN and -ENOMEM errors are
    still retried indefinately with associated data-transfer
    callbacks, following existing queue-full logic.

    Also fix two missing ->queue_status() queue-full cases
    related to CMD_T_ABORTED w/ TAS status handling.

    Reported-by: Potnuri Bharat Teja
    Reviewed-by: Potnuri Bharat Teja
    Tested-by: Potnuri Bharat Teja
    Cc: Potnuri Bharat Teja
    Reported-by: Steve Wise
    Cc: Steve Wise
    Cc: Sagi Grimberg
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     

30 Mar, 2017

3 commits

  • The t_data_nents and t_bidi_data_nents are the numbers of the
    segments, but it couldn't be sure the block size equals to size
    of the segment.

    For the worst case, all the blocks are discontiguous and there
    will need the same number of iovecs, that's to say: blocks == iovs.
    So here just set the number of iovs to block count needed by tcmu
    cmd.

    Tested-by: Ilias Tsitsimpis
    Reviewed-by: Mike Christie
    Signed-off-by: Xiubo Li
    Cc: stable@vger.kernel.org # 3.18+
    Signed-off-by: Nicholas Bellinger

    Xiubo Li
     
  • If there has BIDI data, its first iov[] will overwrite the last
    iov[] for se_cmd->t_data_sg.

    To fix this, we can just increase the iov pointer, but this may
    introuduce a new memory leakage bug: If the se_cmd->data_length
    and se_cmd->t_bidi_data_sg->length are all not aligned up to the
    DATA_BLOCK_SIZE, the actual length needed maybe larger than just
    sum of them.

    So, this could be avoided by rounding all the data lengthes up
    to DATA_BLOCK_SIZE.

    Reviewed-by: Mike Christie
    Tested-by: Ilias Tsitsimpis
    Reviewed-by: Bryant G. Ly
    Signed-off-by: Xiubo Li
    Cc: stable@vger.kernel.org # 3.18+
    Signed-off-by: Nicholas Bellinger

    Xiubo Li
     
  • This patch closes a race between se_lun deletion during configfs
    unlink in target_fabric_port_unlink() -> core_dev_del_lun()
    -> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks
    waiting for percpu_ref RCU grace period to finish, but a new
    NodeACL mappedlun is added before the RCU grace period has
    completed.

    This can happen in target_fabric_mappedlun_link() because it
    only checks for se_lun->lun_se_dev, which is not cleared until
    after transport_clear_lun_ref() percpu_ref RCU grace period
    finishes.

    This bug originally manifested as NULL pointer dereference
    OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on
    v4.1.y code, because it dereferences lun->lun_se_dev without
    a explicit NULL pointer check.

    In post v4.1 code with target-core RCU conversion, the code
    in target_stat_scsi_att_intr_port_show_attr_dev() no longer
    uses se_lun->lun_se_dev, but the same race still exists.

    To address the bug, go ahead and set se_lun>lun_shutdown as
    early as possible in core_tpg_remove_lun(), and ensure new
    NodeACL mappedlun creation in target_fabric_mappedlun_link()
    fails during se_lun shutdown.

    Reported-by: James Shen
    Cc: James Shen
    Tested-by: James Shen
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger