27 May, 2016
9 commits
-
The pio map initialization function is off by 1 causing the last
kernel send context that is allocated to not get mapped into the
pio map which leads to the last kernel send context not being used
by any of the qps.The send context reserved for VL15 is taken care of by setting the
scontext variable that is used as the index into the kernel send
context array to 1 and does not need to be accounted for in the
kernel send context counting loop as it is currently done.Fix the kernel send context counting loop to account for all the
allocated send contexts and map all of them to the different VLs.Reviewed-by: Dennis Dalessandro
Reviewed-by: Mike Marciniszyn
Reviewed-by: Jianxin Xiong
Signed-off-by: Jubin John
Signed-off-by: Doug Ledford -
Two 8051 link settings, external device config and tuning method,
were written in the wrong location and the previous settings were
not cleared. For both, clear the old value and write the new
value.Fixes: 8ebd4cf1852a ("staging/rdma/hfi1: Add active and optical cable support")
Reviewed-by: Dennis Dalessandro
Signed-off-by: Dean Luick
Signed-off-by: Doug Ledford -
When FM is disabled, and the HFI port on the switch is
changed from MgmtAllowed=YES to MgmtAllowed=NO and the
link is bounced, FULL_MGMT_P_KEY doesn't get cleared
from the pkey table. This also occurs when the QSFP
cable is moved from a switch port with MgmtAllowed=YES
to a MgmtAllowed=NO port. Clear pkey entry properly.Also, when the driver is loaded and the switch port is
set to MgmtAllowed=NO, FULL_MGMT_P_KEY shouldn't be added
to pkey table after FM is started. Only set FULL_MGMT_P_KEY
in the pkey table if switch port is configured to
MgmtAllowed=YES.Reviewed-by: Dean Luick
Signed-off-by: Sebastian Sanchez
Signed-off-by: Doug Ledford -
rdmavt allows the driver to specify the size of the ack queue, but
only uses it for the modify QP limit testing for setting the atomic
limit value.The driver dependent size is now used to size the s_ack_queue ring
dynamicially.Since the driver knows its size, the driver will use its define
for any ring size dependent code.Reviewed-by: Mitko Haralanov
Signed-off-by: Mike Marciniszyn
Signed-off-by: Doug Ledford -
This matches the ib_qp_attr size and
avoids a extremely large value when the lower level
driver registers.As part of the patch, the u8 ordinals are moved to the
end of the struct to reduce pahole noted excesses.Reviewed-by: Mitko Haralanov
Reviewed-by: Dennis Dalessandro
Signed-off-by: Mike Marciniszyn
Signed-off-by: Doug Ledford -
Commit b9b06cb6feda
("IB/hfi1: Fix missing lock/unlock in verbs drain callback")
added a spin lock.Unfortunately, the new lock code can be called from a base
level interrupt state, and an interrupt that can get stacked
will attempt to get the same lock.Fix by using the flag save/restore spin lock variation.
Cc: stable@vger.kernel.org # 4.6+
Reviewed-by: Sebastian Sanchez
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Enable trace generation for packets with the "Send Last with
Invalidate" and "Send Only with Invalidate" opcodes.Reviewed-by: Mike Marciniszyn
Reviewed-by: Dennis Dalessandro
Signed-off-by: Jianxin Xiong
Signed-off-by: Doug Ledford -
A new union member "ieth" (Invalidate Extended Transport Header) is
added to the packet header definition in preparation of supporting
the send with invalidate opcode.Reviewed-by: Mike Marciniszyn
Reviewed-by: Dennis Dalessandro
Signed-off-by: Jianxin Xiong
Signed-off-by: Doug Ledford
26 May, 2016
24 commits
-
The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.Reviewed-by: Jubin John
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
The deletion of a cdev is not a fence for holding off references to the
structure. The driver attempts to delete the cdev and then proceeds to
free the parent structure, the hfi1_devdata, or dd. This can potentially
lead to a kernel panic in situations where a user has an FD for the cdev
open, and the pci device gets removed. If the user then closes the FD
there will be a NULL dereference when trying to do put on the cdev's
kobject.Fix this by pointing the cdev's kobject.parent at a new kobject embedded
in its parent structure. Also take a reference when the device is opened
and put it back when it is closed.Reviewed-by: Mitko Haralanov
Signed-off-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Add a trace message to HFI1s user IOCTL handling. This allows debugging
of which IOCTLs are being handled by the driver.Reviewed-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Remove the write() handler for user space commands now that ioctl
handling is available. User apps will need to change to use ioctl from
this point forward.Reviewed-by: Mitko Haralanov
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
IOCTL is more suited to what user space commands need to do than the
write() interface. Add IOCTL definitions for all existing write commands
and the handling for those. The write() interface will be removed in a
follow on patch.Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Reviewed-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no
reason to live in the driver. Remove it.Reviewed-by: Christoph Hellwig
Reviewed-by: Mitko Haralanov
Reviewed-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
The snoop/diag interface is better served by an implementation which is
more general and usable by other drivers perhaps. Go ahead and remove
the code now and get rid of the char dev. We can put the feature back
when we have a more agreeable solution.Reviewed-by: Dean Luick
Reviewed-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Remove EPROM handling from the cdev which is used for user application
data traffic.Reviewed-by: Dean Luick
Reviewed-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Remove UI char device which exposes direct access to registers for user
space. This was put in to aid in debugging the hardware. We are looking
into alternatives means of providing the same functionality. This
removes another char device from HFI1's footprint.Reviewed-by: Dean Luick
Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
hfi1 current exports a cdev that can be used to target all of the hfi's
in the system. However there is a problem with this approach in
that the devices could be on different subnets. This is a problem that
user space can figure out and explicitly tell the driver on which device
to create a context.Remove the multi-purpose cdev leaving a dedicated cdev for each port.
Also remove the striping capability that is dependent upon the user
choosing the multi-purpose cdev. It is now up to user space to determine
how to stripe contexts.Reviewed-by: Dean Luick
Reviewed-by: Mitko Haralanov
Reviewed-by: Mike Marciniszyn
Reviewed-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Remove the usage of an anti-pattern goto in hfi1_cdev_init to improve
code readability.Suggested-by: Jason Gunthorpe
Reviewed-by: Ira Weiny
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
During the processing of a user SDMA request, if there was an
error before the request counter was increased, the state of
the packet queue could be updated incorrectly, causing the
counter to underflow. As the result, the process could get
stuck later since the counter could never get back to 0.This patch adds a condition to guard the packet queue update
so that the counter is only decreased if it has been increased
before the error happens.Reviewed-by: Mitko Haralanov
Signed-off-by: Jianxin Xiong
Signed-off-by: Doug Ledford -
Building the qib driver with gcc version 6.1.0 raises the following
build warning:
drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning:
'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=]
static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = {
^~~~~~~~~~~~~~~~~~
Remove the unused qib_7322_intr_msgs[]Reviewed-by: Dennis Dalessandro
Reviewed-by: Mike Marciniszyn
Signed-off-by: Jubin John
Signed-off-by: Doug Ledford -
This comment was old, the MTU enums have been defined.
Reviewed-by: Mitko Haralanov
Reviewed-by: Dennis Dalessandro
Signed-off-by: Ira Weiny
Signed-off-by: Doug Ledford -
sdma_event_names[] is only used within CONFIG_SDMA_VERBOSITY ifdefs, so
when CONFIG_SDMA_VERBOSITY is disabled, it results in the following
0-day build warning:
>> drivers/infiniband/hw/hfi1/sdma.c:137:27: warning: 'sdma_event_names'
>> defined but not used [-Wunused-const-variable=]
static const char * const sdma_event_names[] = {
^~~~~~~~~~~~~~~~
This occurs on the following compiler:
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430For more information check:
https://lists.01.org/pipermail/kbuild-all/2016-May/020060.htmlFix this warning by defining sdma_event_name[] only within the
CONFIG_SDMA_VERBOSITY ifdefs.Reported-by: kbuild test robot
Reviewed-by: Mike Marciniszyn
Reviewed-by: Dennis Dalessandro
Signed-off-by: Jubin John
Signed-off-by: Doug Ledford -
Use kzalloc_node instead of kzalloc for rdmavt memory region segment
allocation to optimize for performance on NUMA platforms.Reviewed-by: Dennis Dalessandro
Signed-off-by: Jubin John
Signed-off-by: Doug Ledford -
The usage of the various vmalloc APIs do not consistently zero memory
when allocating the swqe. Insure zeroing variants are used.Reviewed-by: Mitko Haralanov
Signed-off-by: Mike Marciniszyn
Signed-off-by: Dennis Dalessandro
Signed-off-by: Doug Ledford -
Commit e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which
may cause corruption") introduced a bug which may cause a reference
count of a interval RB node to be leaked in the case where an SDMA
transfer from that node completes at the same time as the node is
being extended.If a node is being extended, it is first removed from the RB tree
in order to be processed without the risk of an invalidation event
removing the node at the same time.If a SDMA completion happens during that time, the completion handler
will fail to find the node in the RB tree and, therefore, fail to
correctly decrement its refcount. This leaves the node in the tree and
its pages pinned for the duration of the user process.To prevent this from happening the io vector adds a reference to the
RB node, which is used during the SDMA completion instead of looking
up the node in the RB tree.This change adds a performance improvement as a side effect by avoiding
the RB tree lookup.Fixes: e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption")
Reviewed-by: Dean Luick
Reviewed-by: Harish Chegondi
Signed-off-by: Mitko Haralanov
Signed-off-by: Doug Ledford -
In IB networks, and specifically in IPoIB/rdmacm traffic, the device
address of an IPoIB interface is used as a means to exchange information
between nodes needed for communication.Currently an IPoIB interface will always be created with a device
address based on its node GUID without a way to change that.This change adds the ability to set the device address of an IPoIB
interface by value. We use the set mac address ndo to do that.The flow should be broken down to two:
1) The GID value is already in the GID table,
in this case the interface will be able to set carrier up.2) The GID value is not yet in the GID table,
in this case the interface won't try to join the multicast group
and will wait (listen on GID_CHANGE event) until the GID is inserted.In order to track those changes, we add a new flag:
* IPOIB_FLAG_DEV_ADDR_SET.When set, it means the dev_addr is a based on a value in the gid
table. this bit will be cleared upon a dev_addr change triggered
by the user and set after validation.Per IB spec the port GUID can't change if the module is loaded.
port GUID is the basis for GID at index 0 which is the basis for
the default device address of a ipoib interface.The issue is that there are devices that don't follow the spec,
they change the port GUID while HCA is powered on, so in order
not to break userspace applications. We need to check if the
user wanted to control the device address and we assume that
if he sets the device address back to be based on GID index 0,
he no longer wishs to control it.In order to track this, we add an additional flag:
* IPOIB_FLAG_DEV_ADDR_CTRLWhen setting the device address, there is no validation of the upper
twelve bytes of the device address (flags, qpn, subnet prefix) as those
bytes are not under the control of the user.Signed-off-by: Mark Bloch
Reviewed-by: Leon Romanovsky
Signed-off-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
Check (via an SA query) if the SM supports the new option for SendOnly
multicast joins.
If the SM supports that option it will use the new join state to create
such multicast group.
If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
join for SendOnly MCG, use the correct state if supported.This check is performed at every invocation of mcast_restart task, to be
sure that the driver stays in sync with the current state of the SM.Signed-off-by: Erez Shitrit
Reviewed-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
There are four types for MCG, FullMember, NonMember, SendOnlyNonMember,
and the new added type: SendOnlyFullMember.
Add support for the new SendOnlyFullMember join state.The new type allows host to send join request as sendonly, it will cause the
group to be created but without getting packets from this multicast back to the
host.Signed-off-by: Erez Shitrit
Reviewed-by: Leon Romanovsky
Reviewed-by: Christoph Lameter
Reviewed-by: Ira Weiny
Signed-off-by: Doug Ledford -
New SA query function to return the ClassPortInfo struct from the SA.
If the SM supports FullMemberSendOnly mode for MCG's, it sets a
capability bit in the capability_mask2 field of the response.Signed-off-by: Erez Shitrit
Reviewed-by: Leon Romanovsky
Signed-off-by: Doug Ledford -
Change struct ib_class_port_info to conform to IB Spec 1.3
That in order to get specific capability mask from ClassPortInfo mad.>From the IB Spec, ClassPortInfo section:
"CapabilityMask2 Bits 0-26: Additional class-specific capabilities...
RespTimeValue the rest 5 bits"The new struct now has one field for capabilitymask2 (previously was the
reserved field) and the resp_time field.And it fixes up qib and srpt, use of the field repurposed to be used as
capabilitymask2:
IB/qib: Change pma_get_classportinfo
IB/srpt: Adjust the use of ib_class_port_infoSigned-off-by: Erez Shitrit
Reviewed-by: Leon Romanovsky
Reviewed-by: Hal Rosenstock
Signed-off-by: Doug Ledford
25 May, 2016
6 commits
-
There is an assumption that rdmacm is used only between nodes
in the same IB subnet, this why ARP resolution can be used to turn
IP to GID in rdmacm.When dealing with IB communication between subnets this assumption
is no longer valid. ARP resolution will get us the next hop device
address and not the peer node's device address.To solve this issue, we will check user space if it can provide the
GID of the peer node, and fail if not.We add a sequence number to identify each request and fill in the GID
upon answer from userspace.Signed-off-by: Mark Bloch
Signed-off-by: Doug Ledford -
Move SA ibnl client registration to ib_core module init.
This will allow us to register a single client to handle
all RDMA_NL_LS operations and make it SA independent.Signed-off-by: Mark Bloch
Signed-off-by: Doug Ledford -
This commits adds a new RDMA local service operation:
- IP to GID resolution.The client request would include the ifindex of the outgoing interface
and would place in an attribute (LS_NLA_TYPE_IPV4 or LS_NLA_TYPE_IPV6)
the destnation IP.The local service would answer with a message that has the attribute:
- LS_NLA_TYPE_DGID - The destination GID.Signed-off-by: Mark Bloch
Signed-off-by: Doug Ledford -
Consolidate ib_sa into ib_core, this commit eliminates
ib_sa.ko and makes it part of ib_core.koSigned-off-by: Mark Bloch
Signed-off-by: Doug Ledford -
Consolidate ib_mad into ib_core, this commit eliminates
ib_mad.ko and makes it part of ib_core.koSigned-off-by: Mark Bloch
Signed-off-by: Doug Ledford -
IB address resolution is declared as a module (ib_addr.ko) which loads
itself before IB core module (ib_core.ko).It causes to the scenario where IB netlink which is initialized by IB
core can't be used by ib_addr.ko.In order to solve it, we are converting ib_addr.ko to be part of
IB core module.Signed-off-by: Leon Romanovsky
Signed-off-by: Leon Romanovsky
Signed-off-by: Mark Bloch
Signed-off-by: Doug Ledford
24 May, 2016
1 commit
-
[ 598.852037] ------------[ cut here ]------------
[ 598.856698] WARNING: at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
[ 598.863079] cxgb3 0000:01:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000003310000] [map size=17 bytes] [unmap size=16 bytes]
[ 598.878265] Modules linked in: xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad kvm_amd kvm ipmi_devintf ipmi_ssif dcdbas pcspkr ipmi_si sg ipmi_msghandler acpi_power_meter amd64_edac_mod shpchp edac_core sp5100_tco k10temp edac_mce_amd i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic iw_cxgb3 pata_acpi ib_core ib_addr mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm pata_atiixp drm ahci libahci serio_raw i2c_core cxgb3 libata bnx2 mdio dm_mirror dm_region_hash dm_log dm_mod
[ 598.946822] CPU: 3 PID: 11820 Comm: cmtime Not tainted 3.10.0-327.el7.x86_64.debug #1
[ 598.954681] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012
[ 598.962193] ffff8808077479a8 000000000381a432 ffff880807747960 ffffffff81700918
[ 598.969663] ffff880807747998 ffffffff8108b6c0 ffff880807747a80 ffff8808063f55c0
[ 598.977132] ffffffff833ca850 0000000000000282 ffff88080b1bb800 ffff880807747a00
[ 598.984602] Call Trace:
[ 598.987062] [] dump_stack+0x19/0x1b
[ 598.992224] [] warn_slowpath_common+0x70/0xb0
[ 598.998254] [] warn_slowpath_fmt+0x5c/0x80
[ 599.004033] [] check_unmap+0xf8/0x920
[ 599.009369] [] ? sched_clock+0x9/0x10
[ 599.014702] [] debug_dma_free_coherent+0x7e/0xa0
[ 599.021008] [] cxio_destroy_cq+0xcc/0x160 [iw_cxgb3]
[ 599.027654] [] iwch_destroy_cq+0xf0/0x140 [iw_cxgb3]
[ 599.034307] [] ib_destroy_cq+0x1e/0x30 [ib_core]
[ 599.040601] [] ib_uverbs_close+0x302/0x4d0 [ib_uverbs]
[ 599.047417] [] __fput+0x102/0x310
[ 599.052401] [] ____fput+0xe/0x10
[ 599.057297] [] task_work_run+0xb4/0xe0
[ 599.062719] [] do_exit+0x304/0xc60
[ 599.067789] [] ? native_sched_clock+0x35/0x80
[ 599.073820] [] ? sched_clock+0x9/0x10
[ 599.079153] [] ? _raw_spin_unlock_irq+0x2c/0x50
[ 599.085358] [] do_group_exit+0x4c/0xc0
[ 599.090779] [] get_signal_to_deliver+0x2e1/0x960
[ 599.097071] [] do_signal+0x57/0x6e0
[ 599.102229] [] ? sysret_signal+0x5/0x4e
[ 599.107738] [] do_notify_resume+0x5f/0xb0
[ 599.113418] [] int_signal+0x12/0x17
[ 599.118576] ---[ end trace 1e4653102e7e7019 ]---
[ 599.123211] Mapped at:
[ 599.125577] [] debug_dma_alloc_coherent+0x2b/0x80
[ 599.131968] [] cxio_create_cq+0xf2/0x1f0 [iw_cxgb3]
[ 599.139920] [] iwch_create_cq+0x105/0x4e0 [iw_cxgb3]
[ 599.147895] [] create_cq.constprop.14+0x184/0x2e0 [ib_uverbs]
[ 599.156649] [] ib_uverbs_create_cq+0x10b/0x140 [ib_uverbs]Fixes: b955150ea784 ('RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error')
Signed-off-by: Honggang Li
Reviewed-by: Leon Romanovsky
Reviewed-by: Steve Wise
Signed-off-by: Doug Ledford