26 Sep, 2008

1 commit

  • Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
    change events") changed how paths are flushed on an SM event. This
    change introduces a problem if the path record query triggered by
    fails, causing path->ah to become NULL. A later successful path query
    will then trigger WARN_ON() in path_rec_completion(), and crash
    because path->ah has already been freed, so the ipoib_put_ah() inside
    the lock in path_rec_completion() may actually drop the last reference
    (contrary to the comment that claims this is safe).

    Fix this by updating path->ah and freeing old_ah only when the path
    record query is successful. This prevents the neighbour AH and that
    path AH from getting out of sync.

    This fixes

    Reported-by: Rabah Salem
    Debugged-by: Eli Cohen
    Signed-off-by: Roland Dreier
    Signed-off-by: Linus Torvalds

    Roland Dreier
     

20 Sep, 2008

1 commit


17 Sep, 2008

3 commits

  • Roland Dreier
     
  • Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
    ipoib_stop(). We avoid it by scheduling the piece of code that takes
    the lock on ipoib_workqueue instead of executing it directly. This
    works because we only flush the ipoib_workqueue with the RTNL not held.

    The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
    which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
    which calls ipoib_mcast_leave(). The latter calls
    ib_sa_free_multicast(), and this waits until the multicast completion
    handler finishes. This handler is ipoib_mcast_join_complete(), which
    waits for the rtnl_lock(), which was already taken by ipoib_stop().

    This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
    RTNL in ipoib_stop()").

    Signed-off-by: Yossi Etigin
    Signed-off-by: Roland Dreier

    Yossi Etigin
     
  • Fix QP not being destroyed properly on the client, which leads to
    userspace programs hanging on exit. This is a missing chunk from the
    connection management rewrite in commit 6492cdf3 ("RDMA/nes: CM
    connection setup/teardown rework").

    Signed-off-by: Faisal Latif
    Signed-off-by: Roland Dreier

    Faisal Latif
     

16 Sep, 2008

1 commit

  • Byte swap the addresses in the page list for fast register work requests
    to big endian to match what the HCA expectx. Also, the addresses must
    have the "present" bit set so that the HCA knows it can access them.
    Otherwise the HCA will fault the first time it accesses the memory
    region.

    Signed-off-by: Vladimir Sokolovsky
    Signed-off-by: Roland Dreier

    Vladimir Sokolovsky
     

28 Aug, 2008

1 commit


24 Aug, 2008

1 commit


20 Aug, 2008

2 commits

  • Roland Dreier
     
  • Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
    flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
    is run from the ipoib_workqueue. However, ipoib_stop() (which is run
    inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
    if the join task is pending.

    Fix this by simply not flushing the workqueue from ipoib_stop(). It
    turns out that we really don't care about workqueue tasks running
    during or after ipoib_stop(), as long as we make sure to flush the
    workqueue before unregistering a netdev.

    This fixes .

    Signed-off-by: Roland Dreier

    Roland Dreier
     

16 Aug, 2008

2 commits

  • The check for max physical address was incorrect, thus limiting the
    range of allowed physical addresses.

    Signed-off-by: Dave Olson
    Signed-off-by: Roland Dreier

    Dave Olson
     
  • If a UD QP has some work requests queued to be sent by the DMA engine
    followed by a local loopback work request, we have to wait for the
    previous work requests to finish or the completion for the local
    loopback work request would be generated out of order. The problem
    was that the work request queue pointer was already updated so that
    the request would not be processed when the DMA queue drained.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     

13 Aug, 2008

6 commits

  • Roland Dreier
     
  • Under rare circumstances, the ehca hardware might erroneously generate
    two CQEs for the same WQE, which is not compliant to the IB spec and
    will cause unpredictable errors like memory being freed twice. To
    avoid this problem, the driver needs to detect the second CQE and
    discard it.

    For this purpose, introduce an array holding as many elements as the
    SQ of the QP, called sq_map. Each sq_map entry stores a "reported"
    flag for one WQE in the SQ. When a work request is posted to the SQ,
    the respective "reported" flag is set to zero. After the arrival of a
    CQE, the flag is set to 1, which allows to detect the occurence of a
    second CQE.

    The mapping between WQE / CQE and the corresponding sq_map element is
    implemented by replacing the lowest 16 Bits of the wr_id with the
    index in the queue map. The original 16 Bits are stored in the sq_map
    entry and are restored when the CQE is passed to the application.

    Signed-off-by: Alexander Schmidt
    Signed-off-by: Roland Dreier

    Alexander Schmidt
     
  • The idr_find() function may fail when trying to get the QP that is
    associated with a CQE, e.g. when a QP has been destroyed between the
    generation of a CQE and the poll request for it. In consequence, the
    return value of idr_find() must be checked and the CQE must be
    discarded when the QP cannot be found.

    Signed-off-by: Alexander Schmidt
    Signed-off-by: Roland Dreier

    Alexander Schmidt
     
  • When the ehca driver detects an invalid opcode in a CQE, it currently
    passes the CQE to the application and returns with success. This patch
    changes the CQE handling to discard CQEs with invalid opcodes and to
    continue reading the next CQE from the CQ.

    Signed-off-by: Alexander Schmidt
    Signed-off-by: Roland Dreier

    Alexander Schmidt
     
  • Rename the "poll_cq_one_read_cqe" goto label to what it actually does,
    namely "repoll".

    Signed-off-by: Alexander Schmidt
    Signed-off-by: Roland Dreier

    Alexander Schmidt
     
  • Since the introduction of the port auto-detect mode for ehca, calls to
    modify_qp() may be cached in the device driver when the ports are not
    activated yet. When a modify_qp() call is cached, the qp state remains
    untouched until the port is activated, which will leave the qp in the
    reset state. In the reset state, however, it is not allowed to post SQ
    WQEs, which confuses applications like ib_mad.

    The solution for this problem is to immediately set the qp state as
    requested by modify_qp(), even when the call is cached.

    Signed-off-by: Alexander Schmidt
    Signed-off-by: Roland Dreier

    Alexander Schmidt
     

09 Aug, 2008

1 commit

  • There are users that are running UDP applications that require a large
    receive queue size in order to get good performance. To prevent
    allocation failures for rx_rings when using non-SRQ mode and large
    recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
    alocate rx_rings.

    Signed-off-by: David Wilder
    Signed-off-by: Roland Dreier

    David J. Wilder
     

08 Aug, 2008

4 commits


07 Aug, 2008

1 commit


05 Aug, 2008

5 commits

  • ipath_driver.c:1260: warning: format '%Lx' expects type 'long long unsigned int', but argument 6 has type 'long unsigned int'
    ipath_driver.c:1459: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    ipath_intr.c:358: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_intr.c:358: warning: format '%Lu' expects type 'long long unsigned int', but argument 6 has type 'u64'
    ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 5 has type 'u64'
    ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_intr.c:1123: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_intr.c:1130: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    ipath_iba7220.c:1032: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    ipath_iba7220.c:1045: warning: format '%llX' expects type 'long long unsigned int', but argument 3 has type 'u64'
    ipath_iba7220.c:2506: warning: format '%Lu' expects type 'long long unsigned int', but argument 4 has type 'u64'

    Signed-off-by: Alexander Beregalov
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Signed-off-by: Roland Dreier

    Alexander Beregalov
     
  • Running 'ifconfig up' on the cxgb3 interface with iw_cxgb3 loaded
    causes a deadlock. The rtnl lock is already held in this path. The
    function fw_supports_fastreg() was introduced in 2.6.27 to
    conditionally set the IB_DEVICE_MEM_MGT_EXTENSIONS bit iff the
    firmware was at 7.0 or greater, and this function also acquires the
    rtnl lock and which thus causes a deadlock. Further, if iw_cxgb3 is
    loaded _after_ the nic interface is brought up, then the deadlock does
    not occur and therefore fw_supports_fastreg() does need to grab the
    rtnl lock in that path.

    It turns out this code is all useless anyway. The low level driver
    will NOT allow the open if the firmware isn't 7.0, so iw_cxgb3 can
    always set the MEM_MGT_EXTENSIONS bit. Simplify...

    Signed-off-by: Steve Wise
    Signed-off-by: Roland Dreier

    Steve Wise
     
  • - MWs don't have local read/write permissions.
    - Set the MW_BIND enabled bit if a MR has MW_BIND access.

    Signed-off-by: Steve Wise
    Signed-off-by: Roland Dreier

    Steve Wise
     
  • - Set the stag0 and fastreg capability bits only for kernel qps.
    - QP_PRIV flag is no longer used, so don't set it.

    Signed-off-by: Steve Wise
    Signed-off-by: Roland Dreier

    Steve Wise
     
  • There are a few places where the RDMA CM code handles IPv6 by doing

    struct sockaddr addr;
    u8 pad[sizeof(struct sockaddr_in6) -
    sizeof(struct sockaddr)];

    This is fragile and ugly; handle this in a better way with just

    struct sockaddr_storage addr;

    [ Also roll in patch from Aleksey Senin to
    switch to struct sockaddr_storage and get rid of padding arrays in
    struct rdma_addr. ]

    Signed-off-by: Roland Dreier

    Roland Dreier
     

04 Aug, 2008

1 commit

  • from include/asm-powerpc. This is the result of a

    mkdir arch/powerpc/include/asm
    git mv include/asm-powerpc/* arch/powerpc/include/asm

    Followed by a few documentation/comment fixups and a couple of places
    where was being used explicitly. Of the latter only
    one was outside the arch code and it is a driver only built for powerpc.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Paul Mackerras

    Stephen Rothwell
     

02 Aug, 2008

1 commit

  • Some module parameters with only one line have the '\n' at the end of the
    description. This is not needed nor wanted as after the description the
    type (i.e. int) is followed by a newline.

    Some modules contain a multi-line description, these are not affected
    by this patch.

    Signed-off-by: Niels de Vos
    Acked-by: Randy Dunlap
    Cc: John W. Linville
    Cc: Ed L. Cashin
    Cc: Dave Airlie
    Cc: Roland Dreier
    Acked-by: Mauro Carvalho Chehab
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Niels de Vos
     

31 Jul, 2008

1 commit

  • A few functions in the ipath driver incorrectly use unsigned int to
    hold irq flags for spin_lock_irqsave().

    This patch was generated using the Coccinelle framework with the
    following semantic patch:

    The semantic patch I used was this:

    @@
    expression lock;
    identifier flags;
    expression subclass;
    @@

    - unsigned int flags;
    + unsigned long flags;

    ...

    Cc: Ralph Campbell
    Cc: Julia Lawall
    Cc: Alexey Dobriyan
    Signed-off-by: Vegard Nossum
    Signed-off-by: Roland Dreier

    Vegard Nossum
     

30 Jul, 2008

1 commit


27 Jul, 2008

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
    mlx4_core: Add VLAN tag field to WQE control segment struct
    RDMA/nes: CM connection setup/teardown rework
    IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
    IPoIB/cm: Connected mode is no longer EXPERIMENTAL
    RDMA/ucm: BKL is not needed for ib_ucm_open()
    RDMA/ucma: BKL is not needed for ucma_open()

    Linus Torvalds
     
  • Roland Dreier
     
  • Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
    architecture does:

    This enables us to cleanly fix the Calgary IOMMU issue that some devices
    are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

    I think that per-device dma_mapping_ops support would be also helpful for
    KVM people to support PCI passthrough but Andi thinks that this makes it
    difficult to support the PCI passthrough (see the above thread). So I
    CC'ed this to KVM camp. Comments are appreciated.

    A pointer to dma_mapping_ops to struct dev_archdata is added. If the
    pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
    NULL, the system-wide dma_ops pointer is used as before.

    If it's useful for KVM people, I plan to implement a mechanism to register
    a hook called when a new pci (or dma capable) device is created (it works
    with hot plugging). It enables IOMMUs to set up an appropriate
    dma_mapping_ops per device.

    The major obstacle is that dma_mapping_error doesn't take a pointer to the
    device unlike other DMA operations. So x86 can't have dma_mapping_ops per
    device. Note all the POWER IOMMUs use the same dma_mapping_error function
    so this is not a problem for POWER but x86 IOMMUs use different
    dma_mapping_error functions.

    The first patch adds the device argument to dma_mapping_error. The patch
    is trivial but large since it touches lots of drivers and dma-mapping.h in
    all the architecture.

    This patch:

    dma_mapping_error() doesn't take a pointer to the device unlike other DMA
    operations. So we can't have dma_mapping_ops per device.

    Note that POWER already has dma_mapping_ops per device but all the POWER
    IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
    argument.

    [akpm@linux-foundation.org: fix sge]
    [akpm@linux-foundation.org: fix svc_rdma]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix bnx2x]
    [akpm@linux-foundation.org: fix s2io]
    [akpm@linux-foundation.org: fix pasemi_mac]
    [akpm@linux-foundation.org: fix sdhci]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix sparc]
    [akpm@linux-foundation.org: fix ibmvscsi]
    Signed-off-by: FUJITA Tomonori
    Cc: Muli Ben-Yehuda
    Cc: Andi Kleen
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     

26 Jul, 2008

1 commit


25 Jul, 2008

3 commits

  • Major rework of CM connection setup/teardown. We had a number of issues
    with MPI applications not starting/terminating properly over time.
    With these changes we were able to run longer on larger clusters.

    * Remove memory allocation from nes_connect() and nes_cm_connect().
    * Fix mini_cm_dec_refcnt_listen() when destroying listener.
    * Remove unnecessary code from schedule_nes_timer() and nes_cm_timer_tick().
    * Functionalize mini_cm_recv_pkt() and process_packet().
    * Clean up cm_node->ref_count usage.
    * Reuse skbs if available.

    Signed-off-by: Faisal Latif
    Signed-off-by: Roland Dreier

    Faisal Latif
     
  • The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
    which no longer exists. Correct this to talk about the files under
    debugfs that are really created.

    Signed-off-by: Roland Dreier

    Roland Dreier
     
  • Connected mode is now tested and used by lots of people. No need to
    hide it under CONFIG_EXPERIMENTAL.

    Signed-off-by: Roland Dreier

    Roland Dreier