26 Sep, 2008
1 commit
-
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event. This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL. A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).Fix this by updating path->ah and freeing old_ah only when the path
record query is successful. This prevents the neighbour AH and that
path AH from getting out of sync.This fixes
Reported-by: Rabah Salem
Debugged-by: Eli Cohen
Signed-off-by: Roland Dreier
Signed-off-by: Linus Torvalds
20 Sep, 2008
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
RDMA/nes: Fix client side QP destroy
IB/mlx4: Fix up fast register page list format
mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries
17 Sep, 2008
3 commits
-
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop(). We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly. This
works because we only flush the ipoib_workqueue with the RTNL not held.The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes. This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").Signed-off-by: Yossi Etigin
Signed-off-by: Roland Dreier -
Fix QP not being destroyed properly on the client, which leads to
userspace programs hanging on exit. This is a missing chunk from the
connection management rewrite in commit 6492cdf3 ("RDMA/nes: CM
connection setup/teardown rework").Signed-off-by: Faisal Latif
Signed-off-by: Roland Dreier
16 Sep, 2008
1 commit
-
Byte swap the addresses in the page list for fast register work requests
to big endian to match what the HCA expectx. Also, the addresses must
have the "present" bit set so that the HCA knows it can access them.
Otherwise the HCA will fault the first time it accesses the memory
region.Signed-off-by: Vladimir Sokolovsky
Signed-off-by: Roland Dreier
28 Aug, 2008
1 commit
-
Initialize the L_Key and R_Key for memory regions returned from
mlx4_ib_alloc_fast_reg_mr(). Otherwise callers just get garbage for
the memory keys and can't do anything useful with these MRs.Signed-off-by: Vladimir Sokolovsky
Signed-off-by: Roland Dreier
Signed-off-by: Linus Torvalds
24 Aug, 2008
1 commit
-
This patch lets the files using linux/version.h match the files that
#include it.Signed-off-by: Adrian Bunk
Signed-off-by: Linus Torvalds
20 Aug, 2008
2 commits
-
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue. However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.Fix this by simply not flushing the workqueue from ipoib_stop(). It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.This fixes .
Signed-off-by: Roland Dreier
16 Aug, 2008
2 commits
-
The check for max physical address was incorrect, thus limiting the
range of allowed physical addresses.Signed-off-by: Dave Olson
Signed-off-by: Roland Dreier -
If a UD QP has some work requests queued to be sent by the DMA engine
followed by a local loopback work request, we have to wait for the
previous work requests to finish or the completion for the local
loopback work request would be generated out of order. The problem
was that the work request queue pointer was already updated so that
the request would not be processed when the DMA queue drained.Signed-off-by: Ralph Campbell
Signed-off-by: Roland Dreier
13 Aug, 2008
6 commits
-
Under rare circumstances, the ehca hardware might erroneously generate
two CQEs for the same WQE, which is not compliant to the IB spec and
will cause unpredictable errors like memory being freed twice. To
avoid this problem, the driver needs to detect the second CQE and
discard it.For this purpose, introduce an array holding as many elements as the
SQ of the QP, called sq_map. Each sq_map entry stores a "reported"
flag for one WQE in the SQ. When a work request is posted to the SQ,
the respective "reported" flag is set to zero. After the arrival of a
CQE, the flag is set to 1, which allows to detect the occurence of a
second CQE.The mapping between WQE / CQE and the corresponding sq_map element is
implemented by replacing the lowest 16 Bits of the wr_id with the
index in the queue map. The original 16 Bits are stored in the sq_map
entry and are restored when the CQE is passed to the application.Signed-off-by: Alexander Schmidt
Signed-off-by: Roland Dreier -
The idr_find() function may fail when trying to get the QP that is
associated with a CQE, e.g. when a QP has been destroyed between the
generation of a CQE and the poll request for it. In consequence, the
return value of idr_find() must be checked and the CQE must be
discarded when the QP cannot be found.Signed-off-by: Alexander Schmidt
Signed-off-by: Roland Dreier -
When the ehca driver detects an invalid opcode in a CQE, it currently
passes the CQE to the application and returns with success. This patch
changes the CQE handling to discard CQEs with invalid opcodes and to
continue reading the next CQE from the CQ.Signed-off-by: Alexander Schmidt
Signed-off-by: Roland Dreier -
Rename the "poll_cq_one_read_cqe" goto label to what it actually does,
namely "repoll".Signed-off-by: Alexander Schmidt
Signed-off-by: Roland Dreier -
Since the introduction of the port auto-detect mode for ehca, calls to
modify_qp() may be cached in the device driver when the ports are not
activated yet. When a modify_qp() call is cached, the qp state remains
untouched until the port is activated, which will leave the qp in the
reset state. In the reset state, however, it is not allowed to post SQ
WQEs, which confuses applications like ib_mad.The solution for this problem is to immediately set the qp state as
requested by modify_qp(), even when the call is cached.Signed-off-by: Alexander Schmidt
Signed-off-by: Roland Dreier
09 Aug, 2008
1 commit
-
There are users that are running UDP applications that require a large
receive queue size in order to get good performance. To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.Signed-off-by: David Wilder
Signed-off-by: Roland Dreier
08 Aug, 2008
4 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL
IB/mlx4: Allow 4K messages for UD QPs
mlx4_core: Add ethernet fields to CQE struct
IB/ipath: Fix printk format warnings
RDMA/cxgb3: Fix deadlock initializing iw_cxgb3 device
RDMA/cxgb3: Fix up MW access rights
RDMA/cxgb3: Fix QP capabilities
RDMA/cma: Remove padding arrays by using struct sockaddr_storage
IB/ipath: Use unsigned long for irq flags
IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr() -
In case of error, the function ib_create_send_mad() returns an ERR
pointer, but never returns a NULL pointer. So testing the return
value for error should be done with IS_ERR, not by comparing with
NULL.A simplified version of the semantic patch that makes this change is
as follows:(http://www.emn.fr/x-info/coccinelle/)
//
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = ib_create_send_mad(...)? x = E;
//Signed-off-by: Julien Brunel
Signed-off-by: Julia Lawall
Signed-off-by: Roland Dreier -
Current code limits the max message size to 2K for UD QPs, while MTU
might be as big as 4K. This patch sets the maximum message size to
4K, which is needed for UD to work correctly on fabrics with a 4K MTU.Signed-off-by: Alex Naslednikov
Signed-off-by: Eli Cohen
Signed-off-by: Roland Dreier
07 Aug, 2008
1 commit
-
Add ethernet-related fields to struct mlx4_cqe so that the mlx4_en
ethernet NIC driver can share the same definition.Signed-off-by: Yevgeny Petrilin
Signed-off-by: Roland Dreier
05 Aug, 2008
5 commits
-
ipath_driver.c:1260: warning: format '%Lx' expects type 'long long unsigned int', but argument 6 has type 'long unsigned int'
ipath_driver.c:1459: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64'
ipath_intr.c:358: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
ipath_intr.c:358: warning: format '%Lu' expects type 'long long unsigned int', but argument 6 has type 'u64'
ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 5 has type 'u64'
ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
ipath_intr.c:1123: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
ipath_intr.c:1130: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64'
ipath_iba7220.c:1032: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
ipath_iba7220.c:1045: warning: format '%llX' expects type 'long long unsigned int', but argument 3 has type 'u64'
ipath_iba7220.c:2506: warning: format '%Lu' expects type 'long long unsigned int', but argument 4 has type 'u64'Signed-off-by: Alexander Beregalov
Cc: Sean Hefty
Cc: Hal Rosenstock
Signed-off-by: Roland Dreier -
Running 'ifconfig up' on the cxgb3 interface with iw_cxgb3 loaded
causes a deadlock. The rtnl lock is already held in this path. The
function fw_supports_fastreg() was introduced in 2.6.27 to
conditionally set the IB_DEVICE_MEM_MGT_EXTENSIONS bit iff the
firmware was at 7.0 or greater, and this function also acquires the
rtnl lock and which thus causes a deadlock. Further, if iw_cxgb3 is
loaded _after_ the nic interface is brought up, then the deadlock does
not occur and therefore fw_supports_fastreg() does need to grab the
rtnl lock in that path.It turns out this code is all useless anyway. The low level driver
will NOT allow the open if the firmware isn't 7.0, so iw_cxgb3 can
always set the MEM_MGT_EXTENSIONS bit. Simplify...Signed-off-by: Steve Wise
Signed-off-by: Roland Dreier -
- MWs don't have local read/write permissions.
- Set the MW_BIND enabled bit if a MR has MW_BIND access.Signed-off-by: Steve Wise
Signed-off-by: Roland Dreier -
- Set the stag0 and fastreg capability bits only for kernel qps.
- QP_PRIV flag is no longer used, so don't set it.Signed-off-by: Steve Wise
Signed-off-by: Roland Dreier -
There are a few places where the RDMA CM code handles IPv6 by doing
struct sockaddr addr;
u8 pad[sizeof(struct sockaddr_in6) -
sizeof(struct sockaddr)];This is fragile and ugly; handle this in a better way with just
struct sockaddr_storage addr;
[ Also roll in patch from Aleksey Senin to
switch to struct sockaddr_storage and get rid of padding arrays in
struct rdma_addr. ]Signed-off-by: Roland Dreier
04 Aug, 2008
1 commit
-
from include/asm-powerpc. This is the result of a
mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asmFollowed by a few documentation/comment fixups and a couple of places
where was being used explicitly. Of the latter only
one was outside the arch code and it is a driver only built for powerpc.Signed-off-by: Stephen Rothwell
Signed-off-by: Paul Mackerras
02 Aug, 2008
1 commit
-
Some module parameters with only one line have the '\n' at the end of the
description. This is not needed nor wanted as after the description the
type (i.e. int) is followed by a newline.Some modules contain a multi-line description, these are not affected
by this patch.Signed-off-by: Niels de Vos
Acked-by: Randy Dunlap
Cc: John W. Linville
Cc: Ed L. Cashin
Cc: Dave Airlie
Cc: Roland Dreier
Acked-by: Mauro Carvalho Chehab
Cc: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Jul, 2008
1 commit
-
A few functions in the ipath driver incorrectly use unsigned int to
hold irq flags for spin_lock_irqsave().This patch was generated using the Coccinelle framework with the
following semantic patch:The semantic patch I used was this:
@@
expression lock;
identifier flags;
expression subclass;
@@- unsigned int flags;
+ unsigned long flags;...
Cc: Ralph Campbell
Cc: Julia Lawall
Cc: Alexey Dobriyan
Signed-off-by: Vegard Nossum
Signed-off-by: Roland Dreier
30 Jul, 2008
1 commit
-
wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.Reported-by: Hoang-Nam Nguyen
Signed-off-by: Roland Dreier
27 Jul, 2008
3 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
mlx4_core: Add VLAN tag field to WQE control segment struct
RDMA/nes: CM connection setup/teardown rework
IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
IPoIB/cm: Connected mode is no longer EXPERIMENTAL
RDMA/ucm: BKL is not needed for ib_ucm_open()
RDMA/ucma: BKL is not needed for ucma_open() -
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori
Cc: Muli Ben-Yehuda
Cc: Andi Kleen
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Avi Kivity
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jul, 2008
1 commit
-
Update existing Mellanox copyright lines to 2008, and add such lines
to files where they are missing.Signed-off-by: Jack Morgenstein
Signed-off-by: Roland Dreier
25 Jul, 2008
3 commits
-
Major rework of CM connection setup/teardown. We had a number of issues
with MPI applications not starting/terminating properly over time.
With these changes we were able to run longer on larger clusters.* Remove memory allocation from nes_connect() and nes_cm_connect().
* Fix mini_cm_dec_refcnt_listen() when destroying listener.
* Remove unnecessary code from schedule_nes_timer() and nes_cm_timer_tick().
* Functionalize mini_cm_recv_pkt() and process_packet().
* Clean up cm_node->ref_count usage.
* Reuse skbs if available.Signed-off-by: Faisal Latif
Signed-off-by: Roland Dreier -
The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
which no longer exists. Correct this to talk about the files under
debugfs that are really created.Signed-off-by: Roland Dreier
-
Connected mode is now tested and used by lots of people. No need to
hide it under CONFIG_EXPERIMENTAL.Signed-off-by: Roland Dreier