01 Aug, 2013

8 commits

  • …'ocrdma' and 'qib' into for-next

    Roland Dreier
     
  • IPoIB's required behaviour w.r.t to the pkey used by the device is the following:

    - For "parent" interfaces (e.g ib0, ib1, etc) who are created
    automatically as a result of hot-plug events from the IB core, the
    driver needs to take whatever pkey vlaue it finds in index 0, and
    stick to that index.

    - For child interfaces (e.g ib0.8001, etc) created by admin directive,
    the driver needs to use and stick to the value provided during its
    creation.

    In SR-IOV environment its possible for the VF probe to take place
    before the cloud management software provisions the suitable pkey for
    the VF in the paravirtualed PKEY table index 0. When this is the case,
    the VF IB stack will find in index 0 an invalide pkey, which is all
    zeros.

    Moreover, the cloud managment can assign the pkey value at index 0 at
    any time of the guest life cycle.

    The correct behavior for IPoIB to address these requirements for
    parent interfaces is to use PKEY_CHANGE event as trigger to optionally
    re-init the device pkey value and re-create all the relevant resources
    accordingly, if the value of the pkey in index 0 has changed (from
    invalid to valid or from valid value X to invalid value Y).

    This patch enhances the heavy flushing code which is triggered by pkey
    change event, to behave correctly for parent devices. For child
    devices, the code remains the same, namely chases pkey value and not
    index.

    Signed-off-by: Erez Shitrit
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Erez Shitrit
     
  • Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for
    child devices.

    Also, make sure to always set the full membership bit for the pkey of
    devices created by rtnl link ops.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Currently, QP1 is created using pkey index 0. This patch simply looks
    for the index containing the default pkey, rather than hard-coding
    pkey index 0.

    This change will have no effect in native mode, since QP0 and QP1 are
    created before the SM configures the port, so pkey table will still be
    the default table defined by the IB Spec, in C10-123: "If non-volatile
    storage is not used to hold P_Key Table contents, then if a PM
    (Partition Manager) is not present, and prior to PM initialization of
    the P_Key Table, the P_Key Table must act as if it contains a single
    valid entry, at P_Key_ix = 0, containing the default partition
    key. All other entries in the P_Key Table must be invalid."

    Thus, in the native mode case, the driver will find the default pkey
    at index 0 (so it will be no different than the hard-coding).

    However, in SR-IOV mode, for VFs, the pkey table may be
    paravirtualized, so that the VF's pkey index zero may not necessarily
    be mapped to the real pkey index 0. For VFs, therefore, it is
    important to find the virtual index which maps to the real default
    pkey.

    This commit does the following for QP1 creation:

    1. Find the pkey index containing the default pkey, and use that index
    if found. ib_find_pkey() returns the index of the
    limited-membership default pkey (0x7FFF) if the full-member default
    pkey is not in the table.

    2. If neither form of the default pkey is found, use pkey index 0
    (previous behavior).

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Reviewed-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     
  • In the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size
    will be used uninitialized.

    Signed-off-by: Andi Shyti
    Acked-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Andi Shyti
     
  • We don't set "resp.reserved". Since it's at the end of the struct
    that means we don't have to copy it to the user.

    Signed-off-by: Dan Carpenter
    Acked-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Dan Carpenter
     
  • Fix to return a negative error code from the error handling case
    instead of 0, as done elsewhere in this function.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Roland Dreier

    Wei Yongjun
     
  • When creating tunnel QPs for special QP tunneling, look for the
    default pkey in the slave's virtual pkey table. If it is present, use
    the real pkey index where the default pkey is located.

    If the default pkey is not found in the pkey table, use the real pkey
    index which is stored at index 0 in the slave's virtual pkey table
    (this is the current behavior).

    This change is required to support cloud computing, where the
    paravirtualized index of the default pkey is moved to index 1 or
    higher. The pkey at paravirtualized index 0 is used for the default
    IPoIB interface created by the VF.

    Its possible for the pkey value at paravirtualized index 0 to be
    invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey
    index 127, which contains pkey = 0).

    At some point after the VF probe, the cloud computing interface at the
    hypervisor maps virtual index 0 for the VF to the pkey index
    containing the pkey that IPoIB will use in its operation. However,
    when the tunnel QP is created, the pkey at the slave's virtual index 0
    is still mapped to the invalid pkey index, so tunnel QP creation
    fails.

    This commit causes the hypervisor to search for the default pkey in
    the slave's pkey table -- and this pkey is present in the table (at
    index > 0) at tunnel QP creation time, so that the tunnel QP creation
    will succeed.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     

31 Jul, 2013

9 commits


27 Jul, 2013

1 commit


14 Jul, 2013

1 commit

  • Pull InfiniBand/RDMA changes from Roland Dreier:
    - AF_IB (native IB addressing) for CMA from Sean Hefty
    - new mlx5 driver for Mellanox Connect-IB adapters (including post
    merge request fixes)
    - SRP fixes from Bart Van Assche (including fix to first merge request)
    - qib HW driver updates
    - resurrection of ocrdma HW driver development
    - uverbs conversion to create fds with O_CLOEXEC set
    - other small changes and fixes

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
    mlx5: Return -EFAULT instead of -EPERM
    IB/qib: Log all SDMA errors unconditionally
    IB/qib: Fix module-level leak
    mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
    IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
    IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
    mlx5_core: Fixes for sparse warnings
    IB/mlx5: Make profile[] static in main.c
    mlx5: Fix parameter type of health_handler_t
    mlx5: Add driver for Mellanox Connect-IB adapters
    IB/core: Add reserved values to enums for low-level driver use
    IB/srp: Bump driver version and release date
    IB/srp: Make HCA completion vector configurable
    IB/srp: Maintain a single connection per I_T nexus
    IB/srp: Fail I/O fast if target offline
    IB/srp: Skip host settle delay
    IB/srp: Avoid skipping srp_reset_host() after a transport error
    IB/srp: Fix remove_one crash due to resource exhaustion
    IB/qib: New transmitter tunning settings for Dell 1.1 backplane
    IB/core: Fix error return code in add_port()
    ...

    Linus Torvalds
     

12 Jul, 2013

6 commits

  • Roland Dreier
     
  • For copy_to/from_user() failure, the correct error code is -EFAULT not
    -EPERM.

    Signed-off-by: Dan Carpenter
    Acked-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Dan Carpenter
     
  • This patch adds code to log SDMA errors for supportability purposes.

    Signed-off-by: Dean Luick
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Dean Luick
     
  • The vzalloc()'ed field physshadow is leaked on module unload.

    This patch adds vfree after the sibling page shadow is freed.

    Reported-by: Dean Luick
    Reviewed-by: Dean Luick
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Mike Marciniszyn
     
  • If the transport layer is offline it is more appropriate to let
    srp_abort() return FAST_IO_FAIL instead of SUCCESS.

    Reported-by: Sebastian Riemer
    Acked-by: David Dillow
    Signed-off-by: Bart Van Assche
    Signed-off-by: Roland Dreier

    Bart Van Assche
     
  • Pull SCSI target updates from Nicholas Bellinger:
    "Lots of activity this round on performance improvements in target-core
    while benchmarking the prototype scsi-mq initiator code with
    vhost-scsi fabric ports, along with a number of iscsi/iser-target
    improvements and hardening fixes for exception path cases post v3.10
    merge.

    The highlights include:

    - Make persistent reservations APTPL buffer allocated on-demand, and
    drop per t10_reservation buffer. (grover)
    - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
    device pages (grover)
    - Add transport_cmd_check_stop write_pending bit to avoid extra
    access of ->t_state_lock is WRITE I/O submission fast-path. (nab)
    - Drop unnecessary CMD_T_DEV_ACTIVE check from
    transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
    release fast-path. (nab)
    - Avoid extra t_state_lock access in __target_execute_cmd fast-path
    (nab)
    - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
    ->t_state_lock access in release fast-path. (nab)
    - Convert vhost-scsi to use modern se_cmd->cmd_kref
    TARGET_SCF_ACK_KREF usage (nab)
    - Add tracepoints for SCSI commands being processed (roland)
    - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
    ISCSI_OP_TEXT to be transport independent (nab)
    - Add iscsi-target SendTargets=$IQN support for in-band discovery
    (nab)
    - Add iser-target support for in-band discovery (nab + Or)
    - Add iscsi-target demo-mode TPG authentication context support (nab)
    - Fix isert_put_reject payload buffer post (nab)
    - Fix iscsit_add_reject* usage for iser (nab)
    - Fix iscsit_sequence_cmd reject handling for iser (nab)
    - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
    - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

    The last five iscsi/iser-target items are CC'ed to stable, as they do
    address issues present in v3.10 code. They are certainly larger than
    I'd like for stable patch set, but are important to ensure proper
    REJECT exception handling in iser-target for 3.10.y"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
    iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
    target: make queue_tm_rsp() return void
    target: remove unused codes from enum tcm_tmrsp_table
    iscsi-target: kstrtou* configfs attribute parameter cleanups
    iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
    iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
    iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
    iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
    iser-target: Add vendor_err debug output
    target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
    target: Return correct sense data for IO past the end of a device
    target: Add tracepoints for SCSI commands being processed
    iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
    iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
    iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
    iscsi-target: Fix iscsit_add_reject* usage for iser
    iser-target: Fix isert_put_reject payload buffer post
    iscsi-target: missing kfree() on error path
    iscsi-target: Drop left-over iscsi_conn->bad_hdr
    target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
    ...

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     

09 Jul, 2013

4 commits


08 Jul, 2013

6 commits

  • This patch adds a check in isert_rx_opcode() to ignore non TEXT + LOGOUT
    opcodes when SessionType=Discovery has been negotiated.

    Cc: Or Gerlitz
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • The return value wasn't checked by any of the callers. Assuming this is
    correct behaviour, we can simplify some code by not bothering to
    generate it.

    nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around
    srpt_queue_response() void return

    Signed-off-by: Joern Engel
    Signed-off-by: Nicholas Bellinger

    Joern Engel
     
  • This patch adds isert_handle_text_cmd() to handle incoming
    ISCSI_OP_TEXT PDU processing, along with isert_put_text_rsp()
    for posting ISCSI_OP_TEXT_RSP ib_send_wr response.

    It copies ISCSI_OP_TEXT payload using unsolicited payload at
    &iser_rx_desc->data[0] into iscsi_cmd->text_in_ptr for usage
    with outgoing isert_put_text_rsp() -> iscsit_build_text_rsp()

    v2 changes:
    - Let iscsit_build_text_rsp() determine any extra padding

    Reported-by: Or Gerlitz
    Cc: Or Gerlitz
    Cc: Mike Christie
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Now that these two variables are used for REJECT payloads as well
    as SCSI response sense payloads, rename them to something that
    makes more sense.

    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Add output for ib_wc.vendor_err in isert_cq_[t,r]x_work(), which
    is useful for debugging future issues.

    Reported-by: Or Gerlitz
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch addresses a bug where RDMA_CM_EVENT_DISCONNECTED may occur
    before the connection shutdown has been completed by rx/tx threads,
    that causes isert_free_conn() to wait indefinately on ->conn_wait.

    This patch allows isert_disconnect_work code to invoke rdma_disconnect
    when isert_disconnect_work() process context is started by client
    session reset before isert_free_conn() code has been reached.

    It also adds isert_conn->conn_mutex protection for ->state within
    isert_disconnect_work(), isert_cq_comp_err() and isert_free_conn()
    code, along with isert_check_state() for wait_event usage.

    (v2: Add explicit iscsit_cause_connection_reinstatement call
    during isert_disconnect_work() to force conn reset)

    Cc: stable@vger.kernel.org # 3.10+
    Cc: Or Gerlitz
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     

07 Jul, 2013

4 commits

  • This patch adds target_get_sess_cmd reference counting for
    iscsit_handle_task_mgt_cmd(), and adds a target_put_sess_cmd()
    for the failure case.

    It also fixes a bug where ISCSI_OP_SCSI_TMFUNC type commands
    where leaking iscsi_cmd->i_conn_node and eventually triggering
    an OOPs during struct isert_conn shutdown.

    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd()
    in order to avoid external iscsit_reject_cmd() reject usage for all
    PDU types.

    It also updates PDU specific handlers for traditional iscsi-target
    code to not reset the session after posting a ISCSI_OP_REJECT during
    setup.

    (v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call
    target_put_sess_cmd() after iscsit_sequence_cmd() failure)

    Cc: Or Gerlitz
    Cc: Mike Christie
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd()
    usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use
    usage bug in v3.10 with iser-target code.

    It saves ->reject_reason for use within iscsit_build_reject() so the
    correct value for both transport cases. It also drops the legacy
    fail_conn parameter usage throughput iscsi-target code and adds
    two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions,
    along with various small cleanups.

    (v2: Re-enable target_put_sess_cmd() to be called from
    iscsit_add_reject_from_cmd() for rejects invoked after
    target_get_sess_cmd() has been called)

    Cc: Or Gerlitz
    Cc: Mike Christie
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch adds the missing isert_put_reject() logic to post
    a outgoing payload buffer to hold the 48 bytes of original PDU
    header request payload for the rejected cmd.

    It also fixes ISTATE_SEND_REJECT handling in isert_response_completion()
    -> isert_do_control_comp() code, and drops incorrect iscsi_cmd_t->reject_comp
    usage.

    Cc: Or Gerlitz
    Cc: Mike Christie
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger