28 Jan, 2015

5 commits

  • commit 046ba64285a4389ae5e9a7dfa253c6bff3d7c341 upstream.

    This patch drops the arbitrary maximum I/O size limit in sbc_parse_cdb(),
    which currently for fabric_max_sectors is hardcoded to 8192 (4 MB for 512
    byte sector devices), and for hw_max_sectors is a backend driver dependent
    value.

    This limit is problematic because Linux initiators have only recently
    started to honor block limits MAXIMUM TRANSFER LENGTH, and other non-Linux
    based initiators (eg: MSFT Fibre Channel) can also generate I/Os larger
    than 4 MB in size.

    Currently when this happens, the following message will appear on the
    target resulting in I/Os being returned with non recoverable status:

    SCSI OP 28h with too big sectors 16384 exceeds fabric_max_sectors: 8192

    Instead, drop both [fabric,hw]_max_sector checks in sbc_parse_cdb(),
    and convert the existing hw_max_sectors into a purely informational
    attribute used to represent the granuality that backend driver and/or
    subsystem code is splitting I/Os upon.

    Also, update FILEIO with an explicit FD_MAX_BYTES check in fd_execute_rw()
    to deal with the one special iovec limitiation case.

    v2 changes:
    - Drop hw_max_sectors check in sbc_parse_cdb()

    Reported-by: Lance Gropper
    Reported-by: Stefan Priebe
    Cc: Christoph Hellwig
    Cc: Martin K. Petersen
    Cc: Roland Dreier
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Nicholas Bellinger
     
  • commit 23a548ee656c8ba6da8cb2412070edcd62e2ac5d upstream.

    iSER will report supported protection operations based on
    the tpg attribute t10_pi settings and HCA PI offload capabilities.
    If the HCA does not support PI offload or tpg attribute t10_pi is
    not set, we fall to SW PI mode.

    In order to do that, we move iscsit_get_sup_prot_ops after connection
    tpg assignment.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Sagi Grimberg
     
  • commit 954f23722b5753305be490330cf2680b7a25f4a3 upstream.

    Since commit 0fc4ea701fcf ("Target/iser: Don't put isert_conn inside
    disconnected handler") we put the conn kref in isert_wait_conn, so we
    need .wait_conn to be invoked also in the error path.

    Introduce call to isert_conn_terminate (called under lock)
    which transitions the connection state to TERMINATING and calls
    rdma_disconnect. If the state is already teminating, just bail
    out back (temination started).

    Also, make sure to destroy the connection when getting a connect
    error event if didn't get to connected (state UP). Same for the
    handling of REJECTED and UNREACHABLE cma events.

    Squashed:

    iscsi-target: Add call to wait_conn in establishment error flow

    Reported-by: Slava Shwartsman
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Sagi Grimberg
     
  • commit 6bf6ca7515c1df06f5c03737537f5e0eb191e29e upstream.

    This patch changes iscsit_do_tx_data() to fail on short writes
    when kernel_sendmsg() returns a value different than requested
    transfer length, returning -EPIPE and thus causing a connection
    reset to occur.

    This avoids a potential bug in the original code where a short
    write would result in kernel_sendmsg() being called again with
    the original iovec base + length.

    In practice this has not been an issue because iscsit_do_tx_data()
    is only used for transferring 48 byte headers + 4 byte digests,
    along with seldom used control payloads from NOPIN + TEXT_RSP +
    REJECT with less than 32k of data.

    So following Al's audit of iovec consumers, go ahead and fail
    the connection on short writes for now, and remove the bogus
    logic ahead of his proper upstream fix.

    Reported-by: Al Viro
    Cc: David S. Miller
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Nicholas Bellinger
     
  • commit 506787a2c7daed45f0a213674ca706cbc83a9089 upstream.

    tcm_loop has the I_T nexus associated with the HBA. This causes
    commands to become misdirected if the HBA has more than one
    target portal group; any command is then being sent to the
    first target portal group instead of the correct one.

    The nexus needs to be associated with the target portal group
    instead.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Hannes Reinecke
     

03 Nov, 2014

1 commit

  • PREEMPT (and PREEMPT AND ABORT) should return CONFLICT iff a specified
    SERVICE ACTION RESERVATION KEY is specified and matches no existing
    persistent reservation.

    Without this patch, a PREEMPT will return CONFLICT if either all
    reservations are held by the initiator (self preemption) or there is
    nothing to preempt. According to the spec, both of these cases should
    succeed.

    Signed-off-by: Steven Allen
    Signed-off-by: Nicholas Bellinger

    Steven Allen
     

29 Oct, 2014

2 commits

  • The fact that a target is published on the any address has no bearing on
    which port(s) it is published. SendTargets should always send the
    portal's port, not the port used for discovery.

    Signed-off-by: Steven Allen
    Signed-off-by: Nicholas Bellinger

    Steven Allen
     
  • If an initiator sends a zero-length command (e.g. TEST UNIT READY) but
    sets the transfer direction in the transport layer to indicate a
    data-out phase, we still shouldn't try to transfer data. At best it's
    a NOP, and depending on the transport, we might crash on an
    uninitialized sg list.

    Reported-by: Craig Watson
    Signed-off-by: Roland Dreier
    Cc: # 3.1
    Signed-off-by: Nicholas Bellinger

    Roland Dreier
     

22 Oct, 2014

1 commit

  • Pull SCSI target updates from Nicholas Bellinger:
    "Here are the target updates for v3.18-rc2 code. These where
    originally destined for -rc1, but due to the combination of travel
    last week for KVM Forum and my mistake of taking the three week merge
    window literally, the pull request slipped.. Apologies for that.

    Things where reasonably quiet this round. The highlights include:

    - New userspace backend driver (target_core_user.ko) by Shaohua Li
    and Andy Grover
    - A number of cleanups in target, iscsi-taret and qla_target code
    from Joern Engel
    - Fix an OOPs related to queue full handling with CHECK_CONDITION
    status from Quinn Tran
    - Fix to disable TX completion interrupt coalescing in iser-target,
    that was causing problems on some hardware
    - Fix for PR APTPL metadata handling with demo-mode ACLs

    I'm most excited about the new backend driver that uses UIO + shared
    memory ring to dispatch I/O and control commands into user-space.
    This was probably the most requested feature by users over the last
    couple of years, and opens up a new area of development + porting of
    existing user-space storage applications to LIO. Thanks to Shaohua +
    Andy for making this happen.

    Also another honorable mention, a new Xen PV SCSI driver was merged
    via the xen/tip.git tree recently, which puts us now at 10 target
    drivers in upstream! Thanks to David Vrabel + Juergen Gross for their
    work to get this code merged"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
    target/file: fix inclusive vfs_fsync_range() end
    iser-target: Disable TX completion interrupt coalescing
    target: Add force_pr_aptpl device attribute
    target: Fix APTPL metadata handling for dynamic MappedLUNs
    qla_target: don't delete changed nacls
    target/user: Recalculate pad size inside is_ring_space_avail()
    tcm_loop: Fixup tag handling
    iser-target: Fix smatch warning
    target/user: Fix up smatch warnings in tcmu_netlink_event
    target: Add a user-passthrough backstore
    target: Add documentation on the target userspace pass-through driver
    uio: Export definition of struct uio_device
    target: Remove unneeded check in sbc_parse_cdb
    target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
    qla_target: rearrange struct qla_tgt_prm
    qla_target: improve qlt_unmap_sg()
    qla_target: make some global functions static
    qla_target: remove unused parameter
    target: simplify core_tmr_abort_task
    target: encapsulate smp_mb__after_atomic()
    ...

    Linus Torvalds
     

19 Oct, 2014

1 commit

  • Pull core block layer changes from Jens Axboe:
    "This is the core block IO pull request for 3.18. Apart from the new
    and improved flush machinery for blk-mq, this is all mostly bug fixes
    and cleanups.

    - blk-mq timeout updates and fixes from Christoph.

    - Removal of REQ_END, also from Christoph. We pass it through the
    ->queue_rq() hook for blk-mq instead, freeing up one of the request
    bits. The space was overly tight on 32-bit, so Martin also killed
    REQ_KERNEL since it's no longer used.

    - blk integrity updates and fixes from Martin and Gu Zheng.

    - Update to the flush machinery for blk-mq from Ming Lei. Now we
    have a per hardware context flush request, which both cleans up the
    code should scale better for flush intensive workloads on blk-mq.

    - Improve the error printing, from Rob Elliott.

    - Backing device improvements and cleanups from Tejun.

    - Fixup of a misplaced rq_complete() tracepoint from Hannes.

    - Make blk_get_request() return error pointers, fixing up issues
    where we NULL deref when a device goes bad or missing. From Joe
    Lawrence.

    - Prep work for drastically reducing the memory consumption of dm
    devices from Junichi Nomura. This allows creating clone bio sets
    without preallocating a lot of memory.

    - Fix a blk-mq hang on certain combinations of queue depths and
    hardware queues from me.

    - Limit memory consumption for blk-mq devices for crash dump
    scenarios and drivers that use crazy high depths (certain SCSI
    shared tag setups). We now just use a single queue and limited
    depth for that"

    * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
    block: Remove REQ_KERNEL
    blk-mq: allocate cpumask on the home node
    bio-integrity: remove the needless fail handle of bip_slab creating
    block: include func name in __get_request prints
    block: make blk_update_request print prefix match ratelimited prefix
    blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
    block: fix alignment_offset math that assumes io_min is a power-of-2
    blk-mq: Make bt_clear_tag() easier to read
    blk-mq: fix potential hang if rolling wakeup depth is too high
    block: add bioset_create_nobvec()
    block: use bio_clone_fast() in blk_rq_prep_clone()
    block: misplaced rq_complete tracepoint
    sd: Honor block layer integrity handling flags
    block: Replace strnicmp with strncasecmp
    block: Add T10 Protection Information functions
    block: Don't merge requests if integrity flags differ
    block: Integrity checksum flag
    block: Relocate bio integrity flags
    block: Add a disk flag to block integrity profile
    block: Add prefix to block integrity profile flags
    ...

    Linus Torvalds
     

08 Oct, 2014

1 commit

  • Both of the file target's calls to vfs_fsync_range() got the end offset
    off by one. The range is inclusive, not exclusive. It would sync a bit
    more data than was required.

    The sync path already tested the length of the range and fell back to
    LLONG_MAX so I copied that pattern in the rw path.

    This is untested. I found the errors by inspection while following other
    code.

    Signed-off-by: Zach Brown
    Signed-off-by: Nicholas Bellinger

    Zach Brown
     

04 Oct, 2014

6 commits

  • This patch adds a force_pr_aptpl device attribute used to force SPC-3 PR
    Activate Persistence across Target Power Loss (APTPL) operation. This
    makes PR metadata write-out occur during state change regardless if new
    PERSISTENT_RESERVE_OUT CDBs have their APTPL feature bit set.

    This is useful during H/A failover in active/passive setups where all PR
    state is being re-created on a different node, driven by configfs backend
    device + export layout and pre-loaded $DEV/pr/res_aptpl_metadata.

    Cc: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch fixes a bug in handling of SPC-3 PR Activate Persistence
    across Target Power Loss (APTPL) logic where re-creation of state for
    MappedLUNs from dynamically generated NodeACLs did not occur during
    I_T Nexus establishment.

    It adds the missing core_scsi3_check_aptpl_registration() call during
    core_tpg_check_initiator_node_acl() -> core_tpg_add_node_to_devs() in
    order to replay any pre-loaded APTPL metadata state associated with
    the newly connected SCSI Initiator Port.

    Cc: Mike Christie
    Cc:
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • If more than one thread is waiting for command ring space that includes
    a PAD, then if the first one finishes (inserts a PAD and a CMD at the
    start of the cmd ring) then the second one will incorrectly think it still
    needs to insert a PAD (i.e. cmdr_space_needed is now wrong.) This will
    lead to it asking for more space than it actually needs, and then inserting
    a PAD somewhere else than at the end -- not what we want.

    This patch moves the pad calculation inside is_ring_space_available() so
    in the above scenario the second thread would then ask for space not
    including a PAD. The patch also inserts a PAD op based upon an up-to-date
    cmd_head, instead of the potentially stale value.

    Signed-off-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Andy Grover
     
  • The SCSI command tag is set to the tag assigned from the block
    layer, not the SCSI-II tag message. So we need to convert
    it into the correct SCSI-II tag message based on the
    device flags, not the tag value itself.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Nicholas Bellinger

    Hannes Reinecke
     
  • This patch fixes up the following unused return smatch warnings:

    drivers/target/target_core_user.c:778 tcmu_netlink_event warn: unused return: ret = nla_put_string()
    drivers/target/target_core_user.c:780 tcmu_netlink_event warn: unused `return: ret = nla_put_u32()

    (Fix up missing semicolon: grover)

    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Add a LIO storage engine that presents commands to userspace for execution.
    This would allow more complex backstores to be implemented out-of-kernel,
    and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?)
    possible.

    It uses a mmap()able UIO device per LUN to share a command ring and data
    area. The commands are raw SCSI CDBs and iovs for in/out data. The command
    ring is also reused for returning scsi command status and optional sense
    data.

    This implementation is based on Shaohua Li's earlier version but heavily
    modified. Differences include:

    * Shared memory allocated by kernel, not locked-down user pages
    * Single ring for command request and response
    * Offsets instead of embedded pointers
    * Generic SCSI CDB passthrough instead of per-cmd specialization in ring
    format.
    * Uses UIO device instead of anon_file passed in mailbox.
    * Optional in-kernel handling of some commands.

    The main reason for these differences is to permit greater resiliency
    if the user process dies or hangs.

    Things not yet implemented (on purpose):

    * Zero copy. The data area is flexible enough to allow page flipping or
    backend-allocated pages to be used by fabrics, but it's not clear these
    are performance wins. Can come later.
    * Out-of-order command completion by userspace. Possible to add by just
    allowing userspace to change cmd_id in rsp cmd entries, but currently
    not supported.
    * No locks between kernel cmd submission and completion routines. Sounds
    like it's possible, but this can come later.
    * Sparse allocation of mmaped area. Current code vmallocs the whole thing.
    If the mapped area was larger and not fully mapped then the driver would
    have more freedom to change cmd and data area sizes based on demand.

    Current code open issues:

    * The use of idrs may be overkill -- we maybe can replace them with a
    simple counter to generate cmd_ids, and a hash table to get a cmd_id's
    associated pointer.
    * Use of a free-running counter for cmd ring instead of explicit modulo
    math. This would require power-of-2 cmd ring size.

    (Add kconfig depends NET - Randy)

    Signed-off-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Andy Grover
     

03 Oct, 2014

1 commit


02 Oct, 2014

8 commits

  • During temporary resource starvation at lower transport layer, command
    is placed on queue full retry path, which expose this problem. The TCM
    queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
    cmd twice to lower layer. The 1st time led to cmd normal free path.
    The 2nd time cause Null pointer access.

    This regression bug was originally introduced v3.1-rc code in the
    following commit:

    commit e057f53308a5f071556ee80586b99ee755bf07f5
    Author: Christoph Hellwig
    Date: Mon Oct 17 13:56:41 2011 -0400

    target: remove the transport_qf_callback se_cmd callback

    Signed-off-by: Quinn Tran
    Signed-off-by: Saurav Kashyap
    Cc: # v3.1+
    Signed-off-by: Nicholas Bellinger

    Quinn Tran
     
  • list_for_each_entry_safe is necessary if list objects are deleted from
    the list while traversing it. Not the case here, so we can use the base
    list_for_each_entry variant.

    Signed-off-by: Joern Engel
    Signed-off-by: Nicholas Bellinger

    Joern Engel
     
  • The target code has a rather generous helping of smp_mb__after_atomic()
    throughout the code base. Most atomic operations were followed by one
    and none were preceded by smp_mb__before_atomic(), nor accompanied by a
    comment explaining the need for a barrier.

    Instead of trying to prove for every case whether or not it is needed,
    this patch introduces atomic_inc_mb() and atomic_dec_mb(), which
    explicitly include the memory barriers before and after the atomic
    operation. For now they are defined in a target header, although they
    could be of general use.

    Most of the existing atomic/mb combinations were replaced by the new
    helpers. In a few cases the atomic was sandwiched in
    spin_lock/spin_unlock and I simply removed the barrier.

    I suspect that in most cases the correct conversion would have been to
    drop the barrier. I also suspect that a few cases exist where a) the
    barrier was necessary and b) a second barrier before the atomic would
    have been necessary and got added by this patch.

    Signed-off-by: Joern Engel
    Signed-off-by: Nicholas Bellinger

    Joern Engel
     
  • atomic_inc_return() already does an implicit memory barrier and the
    second case was moved from an atomic to a plain flag operation. If a
    barrier were needed in the second case, it would have to be smp_mb(),
    not a variant optimized away for x86 and other architectures.

    Signed-off-by: Joern Engel
    Signed-off-by: Nicholas Bellinger

    Joern Engel
     
  • And while at it, do minimal coding style fixes in the area.

    Signed-off-by: Joern Engel
    Signed-off-by: Nicholas Bellinger

    Joern Engel
     
  • Simple and just called from one place.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Andy Grover
     
  • Remove core_tpg_pre_dellun entirely, since we don't need to get/check
    a pointer we already have.

    Nothing else can return an error, so core_dev_del_lun can return void.

    Rename core_tpg_post_dellun to remove_lun - a clearer name, now that
    pre_dellun is gone.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Andy Grover
     
  • Nothing in it can raise an error.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Andy Grover
     

25 Sep, 2014

2 commits

  • With the recent addition of percpu_ref_reinit(), percpu_ref now can be
    used as a persistent switch which can be turned on and off repeatedly
    where turning off maps to killing the ref and waiting for it to drain;
    however, there currently isn't a way to initialize a percpu_ref in its
    off (killed and drained) state, which can be inconvenient for certain
    persistent switch use cases.

    Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
    selection of operation mode; however, currently a newly initialized
    percpu_ref is always in percpu mode making it impossible to avoid the
    latency overhead of switching to atomic mode.

    This patch adds @flags to percpu_ref_init() and implements the
    following flags.

    * PERCPU_REF_INIT_ATOMIC : start ref in atomic mode
    * PERCPU_REF_INIT_DEAD : start ref killed and drained

    These flags should be able to serve the above two use cases.

    v2: target_core_tpg.c conversion was missing. Fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Johannes Weiner

    Tejun Heo
     
  • …linux-block into for-3.18

    This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
    kludge for SCSI blk-mq stall during probe") which implements
    __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
    commit reverted and patches to implement proper fix will be added.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Kent Overstreet <kmo@daterainc.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Christoph Hellwig <hch@lst.de>

    Tejun Heo
     

18 Sep, 2014

12 commits