27 May, 2016

9 commits

  • Doug Ledford
     
  • The pio map initialization function is off by 1 causing the last
    kernel send context that is allocated to not get mapped into the
    pio map which leads to the last kernel send context not being used
    by any of the qps.

    The send context reserved for VL15 is taken care of by setting the
    scontext variable that is used as the index into the kernel send
    context array to 1 and does not need to be accounted for in the
    kernel send context counting loop as it is currently done.

    Fix the kernel send context counting loop to account for all the
    allocated send contexts and map all of them to the different VLs.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Jianxin Xiong
    Signed-off-by: Jubin John
    Signed-off-by: Doug Ledford

    Jubin John
     
  • Two 8051 link settings, external device config and tuning method,
    were written in the wrong location and the previous settings were
    not cleared. For both, clear the old value and write the new
    value.

    Fixes: 8ebd4cf1852a ("staging/rdma/hfi1: Add active and optical cable support")
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Dean Luick
    Signed-off-by: Doug Ledford

    Dean Luick
     
  • When FM is disabled, and the HFI port on the switch is
    changed from MgmtAllowed=YES to MgmtAllowed=NO and the
    link is bounced, FULL_MGMT_P_KEY doesn't get cleared
    from the pkey table. This also occurs when the QSFP
    cable is moved from a switch port with MgmtAllowed=YES
    to a MgmtAllowed=NO port. Clear pkey entry properly.

    Also, when the driver is loaded and the switch port is
    set to MgmtAllowed=NO, FULL_MGMT_P_KEY shouldn't be added
    to pkey table after FM is started. Only set FULL_MGMT_P_KEY
    in the pkey table if switch port is configured to
    MgmtAllowed=YES.

    Reviewed-by: Dean Luick
    Signed-off-by: Sebastian Sanchez
    Signed-off-by: Doug Ledford

    Sebastian Sanchez
     
  • rdmavt allows the driver to specify the size of the ack queue, but
    only uses it for the modify QP limit testing for setting the atomic
    limit value.

    The driver dependent size is now used to size the s_ack_queue ring
    dynamicially.

    Since the driver knows its size, the driver will use its define
    for any ring size dependent code.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • This matches the ib_qp_attr size and
    avoids a extremely large value when the lower level
    driver registers.

    As part of the patch, the u8 ordinals are moved to the
    end of the struct to reduce pahole noted excesses.

    Reviewed-by: Mitko Haralanov
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • Commit b9b06cb6feda
    ("IB/hfi1: Fix missing lock/unlock in verbs drain callback")
    added a spin lock.

    Unfortunately, the new lock code can be called from a base
    level interrupt state, and an interrupt that can get stacked
    will attempt to get the same lock.

    Fix by using the flag save/restore spin lock variation.

    Cc: stable@vger.kernel.org # 4.6+
    Reviewed-by: Sebastian Sanchez
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • Enable trace generation for packets with the "Send Last with
    Invalidate" and "Send Only with Invalidate" opcodes.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jianxin Xiong
    Signed-off-by: Doug Ledford

    Jianxin Xiong
     
  • A new union member "ieth" (Invalidate Extended Transport Header) is
    added to the packet header definition in preparation of supporting
    the send with invalidate opcode.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jianxin Xiong
    Signed-off-by: Doug Ledford

    Jianxin Xiong
     

26 May, 2016

24 commits

  • Doug Ledford
     
  • The TODO list for the hfi1 driver was completed during 4.6. In addition
    other objections raised (which are far beyond what was in the TODO list)
    have been addressed as well. It is now time to remove the driver from
    staging and into the drivers/infiniband sub-tree.

    Reviewed-by: Jubin John
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • The deletion of a cdev is not a fence for holding off references to the
    structure. The driver attempts to delete the cdev and then proceeds to
    free the parent structure, the hfi1_devdata, or dd. This can potentially
    lead to a kernel panic in situations where a user has an FD for the cdev
    open, and the pci device gets removed. If the user then closes the FD
    there will be a NULL dereference when trying to do put on the cdev's
    kobject.

    Fix this by pointing the cdev's kobject.parent at a new kobject embedded
    in its parent structure. Also take a reference when the device is opened
    and put it back when it is closed.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Add a trace message to HFI1s user IOCTL handling. This allows debugging
    of which IOCTLs are being handled by the driver.

    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Remove the write() handler for user space commands now that ioctl
    handling is available. User apps will need to change to use ioctl from
    this point forward.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • IOCTL is more suited to what user space commands need to do than the
    write() interface. Add IOCTL definitions for all existing write commands
    and the handling for those. The write() interface will be removed in a
    follow on patch.

    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no
    reason to live in the driver. Remove it.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mitko Haralanov
    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • The snoop/diag interface is better served by an implementation which is
    more general and usable by other drivers perhaps. Go ahead and remove
    the code now and get rid of the char dev. We can put the feature back
    when we have a more agreeable solution.

    Reviewed-by: Dean Luick
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Remove EPROM handling from the cdev which is used for user application
    data traffic.

    Reviewed-by: Dean Luick
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Remove UI char device which exposes direct access to registers for user
    space. This was put in to aid in debugging the hardware. We are looking
    into alternatives means of providing the same functionality. This
    removes another char device from HFI1's footprint.

    Reviewed-by: Dean Luick
    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • hfi1 current exports a cdev that can be used to target all of the hfi's
    in the system. However there is a problem with this approach in
    that the devices could be on different subnets. This is a problem that
    user space can figure out and explicitly tell the driver on which device
    to create a context.

    Remove the multi-purpose cdev leaving a dedicated cdev for each port.
    Also remove the striping capability that is dependent upon the user
    choosing the multi-purpose cdev. It is now up to user space to determine
    how to stripe contexts.

    Reviewed-by: Dean Luick
    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Remove the usage of an anti-pattern goto in hfi1_cdev_init to improve
    code readability.

    Suggested-by: Jason Gunthorpe
    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • During the processing of a user SDMA request, if there was an
    error before the request counter was increased, the state of
    the packet queue could be updated incorrectly, causing the
    counter to underflow. As the result, the process could get
    stuck later since the counter could never get back to 0.

    This patch adds a condition to guard the packet queue update
    so that the counter is only decreased if it has been increased
    before the error happens.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Jianxin Xiong
    Signed-off-by: Doug Ledford

    Jianxin Xiong
     
  • Building the qib driver with gcc version 6.1.0 raises the following
    build warning:
    drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning:
    'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=]
    static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = {
    ^~~~~~~~~~~~~~~~~~
    Remove the unused qib_7322_intr_msgs[]

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Jubin John
    Signed-off-by: Doug Ledford

    Jubin John
     
  • This comment was old, the MTU enums have been defined.

    Reviewed-by: Mitko Haralanov
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Ira Weiny
     
  • sdma_event_names[] is only used within CONFIG_SDMA_VERBOSITY ifdefs, so
    when CONFIG_SDMA_VERBOSITY is disabled, it results in the following
    0-day build warning:
    >> drivers/infiniband/hw/hfi1/sdma.c:137:27: warning: 'sdma_event_names'
    >> defined but not used [-Wunused-const-variable=]
    static const char * const sdma_event_names[] = {
    ^~~~~~~~~~~~~~~~
    This occurs on the following compiler:
    compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430

    For more information check:
    https://lists.01.org/pipermail/kbuild-all/2016-May/020060.html

    Fix this warning by defining sdma_event_name[] only within the
    CONFIG_SDMA_VERBOSITY ifdefs.

    Reported-by: kbuild test robot
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jubin John
    Signed-off-by: Doug Ledford

    Jubin John
     
  • Use kzalloc_node instead of kzalloc for rdmavt memory region segment
    allocation to optimize for performance on NUMA platforms.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jubin John
    Signed-off-by: Doug Ledford

    Jubin John
     
  • The usage of the various vmalloc APIs do not consistently zero memory
    when allocating the swqe. Insure zeroing variants are used.

    Reviewed-by: Mitko Haralanov
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • Commit e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which
    may cause corruption") introduced a bug which may cause a reference
    count of a interval RB node to be leaked in the case where an SDMA
    transfer from that node completes at the same time as the node is
    being extended.

    If a node is being extended, it is first removed from the RB tree
    in order to be processed without the risk of an invalidation event
    removing the node at the same time.

    If a SDMA completion happens during that time, the completion handler
    will fail to find the node in the RB tree and, therefore, fail to
    correctly decrement its refcount. This leaves the node in the tree and
    its pages pinned for the duration of the user process.

    To prevent this from happening the io vector adds a reference to the
    RB node, which is used during the SDMA completion instead of looking
    up the node in the RB tree.

    This change adds a performance improvement as a side effect by avoiding
    the RB tree lookup.

    Fixes: e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption")
    Reviewed-by: Dean Luick
    Reviewed-by: Harish Chegondi
    Signed-off-by: Mitko Haralanov
    Signed-off-by: Doug Ledford

    Mitko Haralanov
     
  • In IB networks, and specifically in IPoIB/rdmacm traffic, the device
    address of an IPoIB interface is used as a means to exchange information
    between nodes needed for communication.

    Currently an IPoIB interface will always be created with a device
    address based on its node GUID without a way to change that.

    This change adds the ability to set the device address of an IPoIB
    interface by value. We use the set mac address ndo to do that.

    The flow should be broken down to two:
    1) The GID value is already in the GID table,
    in this case the interface will be able to set carrier up.

    2) The GID value is not yet in the GID table,
    in this case the interface won't try to join the multicast group
    and will wait (listen on GID_CHANGE event) until the GID is inserted.

    In order to track those changes, we add a new flag:
    * IPOIB_FLAG_DEV_ADDR_SET.

    When set, it means the dev_addr is a based on a value in the gid
    table. this bit will be cleared upon a dev_addr change triggered
    by the user and set after validation.

    Per IB spec the port GUID can't change if the module is loaded.
    port GUID is the basis for GID at index 0 which is the basis for
    the default device address of a ipoib interface.

    The issue is that there are devices that don't follow the spec,
    they change the port GUID while HCA is powered on, so in order
    not to break userspace applications. We need to check if the
    user wanted to control the device address and we assume that
    if he sets the device address back to be based on GID index 0,
    he no longer wishs to control it.

    In order to track this, we add an additional flag:
    * IPOIB_FLAG_DEV_ADDR_CTRL

    When setting the device address, there is no validation of the upper
    twelve bytes of the device address (flags, qpn, subnet prefix) as those
    bytes are not under the control of the user.

    Signed-off-by: Mark Bloch
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • Check (via an SA query) if the SM supports the new option for SendOnly
    multicast joins.
    If the SM supports that option it will use the new join state to create
    such multicast group.
    If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
    join for SendOnly MCG, use the correct state if supported.

    This check is performed at every invocation of mcast_restart task, to be
    sure that the driver stays in sync with the current state of the SM.

    Signed-off-by: Erez Shitrit
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Erez Shitrit
     
  • There are four types for MCG, FullMember, NonMember, SendOnlyNonMember,
    and the new added type: SendOnlyFullMember.
    Add support for the new SendOnlyFullMember join state.

    The new type allows host to send join request as sendonly, it will cause the
    group to be created but without getting packets from this multicast back to the
    host.

    Signed-off-by: Erez Shitrit
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Christoph Lameter
    Reviewed-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Erez Shitrit
     
  • New SA query function to return the ClassPortInfo struct from the SA.
    If the SM supports FullMemberSendOnly mode for MCG's, it sets a
    capability bit in the capability_mask2 field of the response.

    Signed-off-by: Erez Shitrit
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Erez Shitrit
     
  • Change struct ib_class_port_info to conform to IB Spec 1.3
    That in order to get specific capability mask from ClassPortInfo mad.

    >From the IB Spec, ClassPortInfo section:
    "CapabilityMask2 Bits 0-26: Additional class-specific capabilities...
    RespTimeValue the rest 5 bits"

    The new struct now has one field for capabilitymask2 (previously was the
    reserved field) and the resp_time field.

    And it fixes up qib and srpt, use of the field repurposed to be used as
    capabilitymask2:
    IB/qib: Change pma_get_classportinfo
    IB/srpt: Adjust the use of ib_class_port_info

    Signed-off-by: Erez Shitrit
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Hal Rosenstock
    Signed-off-by: Doug Ledford

    Erez Shitrit
     

25 May, 2016

6 commits

  • There is an assumption that rdmacm is used only between nodes
    in the same IB subnet, this why ARP resolution can be used to turn
    IP to GID in rdmacm.

    When dealing with IB communication between subnets this assumption
    is no longer valid. ARP resolution will get us the next hop device
    address and not the peer node's device address.

    To solve this issue, we will check user space if it can provide the
    GID of the peer node, and fail if not.

    We add a sequence number to identify each request and fill in the GID
    upon answer from userspace.

    Signed-off-by: Mark Bloch
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • Move SA ibnl client registration to ib_core module init.
    This will allow us to register a single client to handle
    all RDMA_NL_LS operations and make it SA independent.

    Signed-off-by: Mark Bloch
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • This commits adds a new RDMA local service operation:
    - IP to GID resolution.

    The client request would include the ifindex of the outgoing interface
    and would place in an attribute (LS_NLA_TYPE_IPV4 or LS_NLA_TYPE_IPV6)
    the destnation IP.

    The local service would answer with a message that has the attribute:
    - LS_NLA_TYPE_DGID - The destination GID.

    Signed-off-by: Mark Bloch
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • Consolidate ib_sa into ib_core, this commit eliminates
    ib_sa.ko and makes it part of ib_core.ko

    Signed-off-by: Mark Bloch
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • Consolidate ib_mad into ib_core, this commit eliminates
    ib_mad.ko and makes it part of ib_core.ko

    Signed-off-by: Mark Bloch
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • IB address resolution is declared as a module (ib_addr.ko) which loads
    itself before IB core module (ib_core.ko).

    It causes to the scenario where IB netlink which is initialized by IB
    core can't be used by ib_addr.ko.

    In order to solve it, we are converting ib_addr.ko to be part of
    IB core module.

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Mark Bloch
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

24 May, 2016

1 commit

  • [ 598.852037] ------------[ cut here ]------------
    [ 598.856698] WARNING: at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
    [ 598.863079] cxgb3 0000:01:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000003310000] [map size=17 bytes] [unmap size=16 bytes]
    [ 598.878265] Modules linked in: xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad kvm_amd kvm ipmi_devintf ipmi_ssif dcdbas pcspkr ipmi_si sg ipmi_msghandler acpi_power_meter amd64_edac_mod shpchp edac_core sp5100_tco k10temp edac_mce_amd i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic iw_cxgb3 pata_acpi ib_core ib_addr mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm pata_atiixp drm ahci libahci serio_raw i2c_core cxgb3 libata bnx2 mdio dm_mirror dm_region_hash dm_log dm_mod
    [ 598.946822] CPU: 3 PID: 11820 Comm: cmtime Not tainted 3.10.0-327.el7.x86_64.debug #1
    [ 598.954681] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012
    [ 598.962193] ffff8808077479a8 000000000381a432 ffff880807747960 ffffffff81700918
    [ 598.969663] ffff880807747998 ffffffff8108b6c0 ffff880807747a80 ffff8808063f55c0
    [ 598.977132] ffffffff833ca850 0000000000000282 ffff88080b1bb800 ffff880807747a00
    [ 598.984602] Call Trace:
    [ 598.987062] [] dump_stack+0x19/0x1b
    [ 598.992224] [] warn_slowpath_common+0x70/0xb0
    [ 598.998254] [] warn_slowpath_fmt+0x5c/0x80
    [ 599.004033] [] check_unmap+0xf8/0x920
    [ 599.009369] [] ? sched_clock+0x9/0x10
    [ 599.014702] [] debug_dma_free_coherent+0x7e/0xa0
    [ 599.021008] [] cxio_destroy_cq+0xcc/0x160 [iw_cxgb3]
    [ 599.027654] [] iwch_destroy_cq+0xf0/0x140 [iw_cxgb3]
    [ 599.034307] [] ib_destroy_cq+0x1e/0x30 [ib_core]
    [ 599.040601] [] ib_uverbs_close+0x302/0x4d0 [ib_uverbs]
    [ 599.047417] [] __fput+0x102/0x310
    [ 599.052401] [] ____fput+0xe/0x10
    [ 599.057297] [] task_work_run+0xb4/0xe0
    [ 599.062719] [] do_exit+0x304/0xc60
    [ 599.067789] [] ? native_sched_clock+0x35/0x80
    [ 599.073820] [] ? sched_clock+0x9/0x10
    [ 599.079153] [] ? _raw_spin_unlock_irq+0x2c/0x50
    [ 599.085358] [] do_group_exit+0x4c/0xc0
    [ 599.090779] [] get_signal_to_deliver+0x2e1/0x960
    [ 599.097071] [] do_signal+0x57/0x6e0
    [ 599.102229] [] ? sysret_signal+0x5/0x4e
    [ 599.107738] [] do_notify_resume+0x5f/0xb0
    [ 599.113418] [] int_signal+0x12/0x17
    [ 599.118576] ---[ end trace 1e4653102e7e7019 ]---
    [ 599.123211] Mapped at:
    [ 599.125577] [] debug_dma_alloc_coherent+0x2b/0x80
    [ 599.131968] [] cxio_create_cq+0xf2/0x1f0 [iw_cxgb3]
    [ 599.139920] [] iwch_create_cq+0x105/0x4e0 [iw_cxgb3]
    [ 599.147895] [] create_cq.constprop.14+0x184/0x2e0 [ib_uverbs]
    [ 599.156649] [] ib_uverbs_create_cq+0x10b/0x140 [ib_uverbs]

    Fixes: b955150ea784 ('RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error')
    Signed-off-by: Honggang Li
    Reviewed-by: Leon Romanovsky
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Honggang Li