21 Sep, 2009

1 commit


27 Mar, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
    ixgbe: Allow Priority Flow Control settings to survive a device reset
    net: core: remove unneeded include in net/core/utils.c.
    e1000e: update version number
    e1000e: fix close interrupt race
    e1000e: fix loss of multicast packets
    e1000e: commonize tx cleanup routine to match e1000 & igb
    netfilter: fix nf_logger name in ebt_ulog.
    netfilter: fix warning in ebt_ulog init function.
    netfilter: fix warning about invalid const usage
    e1000: fix close race with interrupt
    e1000: cleanup clean_tx_irq routine so that it completely cleans ring
    e1000: fix tx hang detect logic and address dma mapping issues
    bridge: bad error handling when adding invalid ether address
    bonding: select current active slave when enslaving device for mode tlb and alb
    gianfar: reallocate skb when headroom is not enough for fcb
    Bump release date to 25Mar2009 and version to 0.22
    r6040: Fix second PHY address
    qeth: fix wait_event_timeout handling
    qeth: check for completion of a running recovery
    qeth: unregister MAC addresses during recovery.
    ...

    Manually fixed up conflicts in:
    drivers/infiniband/hw/cxgb3/cxio_hal.h
    drivers/infiniband/hw/nes/nes_nic.c

    Linus Torvalds
     

25 Mar, 2009

1 commit


28 Feb, 2009

1 commit

  • Fix ib_set_rmpp_flags() to use the correct bit mask for RRespTime. In
    the 8-bit field of the RMPP header, the first 5 bits are RRespTime and
    next 3 bits are RMPPFlags. Hence to retain the first 5 bits, the mask
    should be 0xF8 instead of 0xF1.

    ack_recv()-->format_ack() calls ib_set_rmpp_flags() and due to the
    incorrect ANDing with 0xF1, RRespTime got changed incorrectly and RMPP
    Acks sent back always had a RRespTime of 0x1E (30) which caused the
    other end to consider the time outs to be approximately 4297 seconds
    (i.e. in the order of 4*2^30) instead of the usual ~4 seconds (order
    of 4*2^20).

    Signed-off-by: Ramachandra K
    Acked-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Ramachandra K
     

15 Feb, 2009

1 commit


18 Jan, 2009

1 commit

  • The base versions handle constant folding just fine, use them
    directly. The replacements are OK in the include/ files as they are
    not exported to userspace so we don't need the __ prefixed versions.

    This patch does not affect code generation at all.

    Signed-off-by: Harvey Harrison
    Signed-off-by: Roland Dreier

    Harvey Harrison
     

05 Aug, 2008

1 commit

  • There are a few places where the RDMA CM code handles IPv6 by doing

    struct sockaddr addr;
    u8 pad[sizeof(struct sockaddr_in6) -
    sizeof(struct sockaddr)];

    This is fragile and ugly; handle this in a better way with just

    struct sockaddr_storage addr;

    [ Also roll in patch from Aleksey Senin to
    switch to struct sockaddr_storage and get rid of padding arrays in
    struct rdma_addr. ]

    Signed-off-by: Roland Dreier

    Roland Dreier
     

27 Jul, 2008

1 commit

  • Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
    architecture does:

    This enables us to cleanly fix the Calgary IOMMU issue that some devices
    are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

    I think that per-device dma_mapping_ops support would be also helpful for
    KVM people to support PCI passthrough but Andi thinks that this makes it
    difficult to support the PCI passthrough (see the above thread). So I
    CC'ed this to KVM camp. Comments are appreciated.

    A pointer to dma_mapping_ops to struct dev_archdata is added. If the
    pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
    NULL, the system-wide dma_ops pointer is used as before.

    If it's useful for KVM people, I plan to implement a mechanism to register
    a hook called when a new pci (or dma capable) device is created (it works
    with hot plugging). It enables IOMMUs to set up an appropriate
    dma_mapping_ops per device.

    The major obstacle is that dma_mapping_error doesn't take a pointer to the
    device unlike other DMA operations. So x86 can't have dma_mapping_ops per
    device. Note all the POWER IOMMUs use the same dma_mapping_error function
    so this is not a problem for POWER but x86 IOMMUs use different
    dma_mapping_error functions.

    The first patch adds the device argument to dma_mapping_error. The patch
    is trivial but large since it touches lots of drivers and dma-mapping.h in
    all the architecture.

    This patch:

    dma_mapping_error() doesn't take a pointer to the device unlike other DMA
    operations. So we can't have dma_mapping_ops per device.

    Note that POWER already has dma_mapping_ops per device but all the POWER
    IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
    argument.

    [akpm@linux-foundation.org: fix sge]
    [akpm@linux-foundation.org: fix svc_rdma]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix bnx2x]
    [akpm@linux-foundation.org: fix s2io]
    [akpm@linux-foundation.org: fix pasemi_mac]
    [akpm@linux-foundation.org: fix sdhci]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix sparc]
    [akpm@linux-foundation.org: fix ibmvscsi]
    Signed-off-by: FUJITA Tomonori
    Cc: Muli Ben-Yehuda
    Cc: Andi Kleen
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     

23 Jul, 2008

2 commits

  • Consumers that want to re-use their QPs in new connections need to
    know when the QP has exited the timewait state. Report the timewait
    event through the rdma_cm.

    Signed-off-by: Amir Vadai
    Acked-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Amir Vadai
     
  • Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
    consumers that wish to have their RDMA sessions always use the same
    links (eg ) as the IP stack does. In the current code, this
    does not happen when bonding is used and fail-over happened but the IB
    link used by an already existing session is operating fine.

    Use the netevent notification for sensing that a change has happened
    in the IP stack, then scan the rdma-cm ID list to see if there is an
    ID that is "misaligned" with respect to the IP stack, and deliver
    RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the
    event or just ignore it.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     

15 Jul, 2008

7 commits

  • Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
    and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip()
    in cma_new_conn_id() to reduce some code duplication and also make
    sure the src_dev member gets set.

    In a high-availability configuration the netdevice pointer can be used
    by the RDMA CM to align RDMA sessions to use the same links as the IP
    stack does under fail-over and route change cases.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • - Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
    IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
    STag support and IB HCAs to indicate reserved L_Key support.

    - Add a u32 local_dma_lkey member to struct ib_device. Drivers fill
    this in with the appropriate local DMA L_Key (if they support it).

    - Fix up the drivers using this flag.

    Signed-off-by: Steve Wise
    Signed-off-by: Roland Dreier

    Steve Wise
     
  • This patch also adds a creation flag for QPs,
    IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that
    multicast sends from the QP to a group that the QP is attached to will
    not be looped back to the QP's receive queue. This can be used to
    save receive resources when a consumer does not want a local copy of
    multicast traffic; for example IPoIB must waste CPU time throwing away
    such local copies of multicast traffic.

    This patch also adds a device capability flag that shows whether a
    device supports this feature or not.

    Signed-off-by: Ron Livne
    Signed-off-by: Roland Dreier

    Ron Livne
     
  • This patch adds a sysfs attribute group called "proto_stats" under
    /sys/class/infiniband/$device/ and populates this group with protocol
    statistics if they exist for a given device. Currently, only iWARP
    stats are defined, but the code is designed to allow InfiniBand
    protocol stats if they become available. These stats are per-device
    and more importantly -not- per port.

    Details:

    - Add union rdma_protocol_stats in ib_verbs.h. This union allows
    defining transport-specific stats. Currently only iwarp stats are
    defined.

    - Add struct iw_protocol_stats to define the current set of iwarp
    protocol stats.

    - Add new ib_device method called get_proto_stats() to return protocol
    statistics.

    - Add logic in core/sysfs.c to create iwarp protocol stats attributes
    if the device is an RNIC and has a get_proto_stats() method.

    Signed-off-by: Steve Wise
    Signed-off-by: Roland Dreier

    Steve Wise
     
  • This patch adds support for the IB "base memory management extension"
    (BMME) and the equivalent iWARP operations (which the iWARP verbs
    mandates all devices must implement). The new operations are:

    - Allocate an ib_mr for use in fast register work requests.

    - Allocate/free a physical buffer lists for use in fast register work
    requests. This allows device drivers to allocate this memory as
    needed for use in posting send requests (eg via dma_alloc_coherent).

    - New send queue work requests:
    * send with remote invalidate
    * fast register memory region
    * local invalidate memory region
    * RDMA read with invalidate local memory region (iWARP only)

    Consumer interface details:

    - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
    to indicate device support for these features.

    - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
    IB_WR_RDMA_READ_WITH_INV are added.

    - A new consumer API function, ib_alloc_mr() is added to allocate
    fast register memory regions.

    - New consumer API functions, ib_alloc_fast_reg_page_list() and
    ib_free_fast_reg_page_list() are added to allocate and free
    device-specific memory for fast registration page lists.

    - A new consumer API function, ib_update_fast_reg_key(), is added to
    allow the key portion of the R_Key and L_Key of a fast registration
    MR to be updated. Consumers call this if desired before posting
    a IB_WR_FAST_REG_MR work request.

    Consumers can use this as follows:

    - MR is allocated with ib_alloc_mr().

    - Page list memory is allocated with ib_alloc_fast_reg_page_list().

    - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

    - MR made VALID and bound to a specific page list via
    ib_post_send(IB_WR_FAST_REG_MR)

    - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
    ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
    invalidate operation.

    - MR is deallocated with ib_dereg_mr()

    - page lists dealloced via ib_free_fast_reg_page_list().

    Applications can allocate a fast register MR once, and then can
    repeatedly bind the MR to different physical block lists (PBLs) via
    posting work requests to a send queue (SQ). For each outstanding
    MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
    allocated (the fast_reg_page_list is owned by the low-level driver
    from the consumer posting a work request until the request completes).
    Thus pipelining can be achieved while still allowing device-specific
    page_list processing.

    The 32-bit fast register memory key/STag is composed of a 24-bit index
    and an 8-bit key. The application can change the key each time it
    fast registers thus allowing more control over the peer's use of the
    key/STag (ie it can effectively be changed each time the rkey is
    rebound to a page list).

    Signed-off-by: Steve Wise
    Signed-off-by: Roland Dreier

    Steve Wise
     
  • Remove subversion $Id lines and improve readability by fixing other
    coding style problems pointed out by checkpatch.pl.

    Signed-off-by: Dotan Barak
    Signed-off-by: Roland Dreier

    Dotan Barak
     
  • The license text for several files references a third software license
    that was inadvertently copied in. Update the license to what was
    intended. This update was based on a request from HP.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

10 Jun, 2008

1 commit

  • In 2.6.26, we added some support for send with invalidate work
    requests, including a device capability flag to indicate whether a
    device supports such requests. However, the support was incomplete:
    the completion structure was not extended with a field for the key
    contained in incoming send with invalidate requests.

    Full support for memory management extensions (send with invalidate,
    local invalidate, fast register through a send queue, etc) is planned
    for 2.6.27. Since send with invalidate is not very useful by itself,
    just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final
    release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27,
    which makes things simpler for applications, since they will not have
    quite as confusing an array of fine-grained bits to check.

    Signed-off-by: Roland Dreier

    Roland Dreier
     

29 Apr, 2008

1 commit

  • Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
    when mapping user-allocated CQs with ib_umem_get().

    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     

20 Apr, 2008

1 commit


17 Apr, 2008

5 commits

  • Add support for modifying CQ parameters for controlling event
    generation moderation.

    Signed-off-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Eli Cohen
     
  • Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
    "send with invalidate" work request as defined in the iWARP verbs and
    the InfiniBand base memory management extensions. Also put "imm_data"
    and a new "invalidate_rkey" member in a new "ex" union in struct
    ib_send_wr. The invalidate_rkey member can be used to pass in an
    R_Key/STag to be invalidated. Add this new union to struct
    ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in
    ib_uverbs_post_send().

    Fix up low-level drivers to deal with the change to struct ib_send_wr,
    and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
    since that code never does any send with immediate operations.

    Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
    the iWARP drivers currently in the tree set the bit. The amso1100
    driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
    if passed in as part of userspace send requests (since it does not
    implement kernel bypass work request queueing). Remove the flag from
    all existing drivers that set it until we know which ones are OK.

    The values chosen for the new flag is not consecutive to avoid clashing
    with flags defined in the XRC patches, which are not merged yet but
    which are already in use and are likely to be merged soon.

    This resurrects a patch sent long ago by Mikkel Hagen .

    Signed-off-by: Roland Dreier

    Roland Dreier
     
  • LSO (large send offload) allows the networking stack to pass SKBs with
    data size larger than the MTU to the IPoIB driver and have the HCA HW
    fragment the data to multiple MSS-sized packets. Add a device
    capability flag IB_DEVICE_UD_TSO for devices that can perform TCP
    segmentation offload, a new send work request opcode IB_WR_LSO,
    header, hlen and mss fields for the work request structure, and a new
    IB_WC_LSO completion type.

    Signed-off-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Eli Cohen
     
  • Add a create_flags member to struct ib_qp_init_attr that will allow a
    kernel verbs consumer to create a pass special flags when creating a QP.
    Add a flag value for telling low-level drivers that a QP will be used
    for IPoIB UD LSO. The create_flags member will also be useful for XRC
    and ehca low-latency QP support.

    Since no create_flags handling is implemented yet, add code to all
    low-level drivers to return -EINVAL if create_flags is non-zero.

    Signed-off-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Eli Cohen
     
  • IDR IDs are signed, so struct ib_uobject.id should be signed. This
    avoids some sparse pointer signedness warnings.

    Signed-off-by: Roland Dreier

    Roland Dreier
     

09 Feb, 2008

2 commits


26 Jan, 2008

2 commits

  • This is based on user feedback from Doug Ledford at RedHat:

    Events that occur on an rdma_cm_id are reported to userspace through an
    event channel. Connection request events are reported on the event
    channel associated with the listen. When the connection is accepted, a
    new rdma_cm_id is created and automatically uses the listen event
    channel. This is suboptimal where the user only wants listen events on
    that channel.

    Additionally, it may be desirable to have events related to connection
    establishment use a different event channel than those related to
    already established connections.

    Allow the user to migrate an rdma_cm_id between event channels. All
    pending events associated with the rdma_cm_id are moved to the new event
    channel.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • To allow ULPs to tune timeout values and capture retry statistics,
    report the number of times that a mad send operation was retried.

    For RMPP mads, report the total number of times that the any portion
    (send window) of the send operation was retried.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

25 Jan, 2008

1 commit


02 Nov, 2007

1 commit


10 Oct, 2007

7 commits

  • The IB CM provides a message received acknowledged (MRA) message that
    can be sent to indicate that a REQ or REP message has been received, but
    will require more time to process than the timeout specified by those
    messages. In many cases, the application may not know how long it will
    take to respond to a CM message, but the majority of the time, it will
    usually respond before a retry has been sent. Rather than sending an
    MRA in response to all messages just to handle the case where a longer
    timeout is needed, it is more efficient to queue the MRA for sending in
    case a duplicate message is received.

    This avoids sending an MRA when it is not needed, but limits the number
    of times that a REQ or REP will be resent. It also provides for a
    simpler implementation than generating the MRA based on a timer event.
    (That is, trying to send the MRA after receiving the first REQ or REP if
    a response has not been generated, so that it is received at the remote
    side before a duplicate REQ or REP has been received)

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • The declaration of struct ib_user_mad_reg_req.method_mask[] exported
    to userspace was an array of __u32, but the kernel internally treated
    it as a bitmap made up of longs. This makes a difference for 64-bit
    big-endian kernels, where numbering the bits in an array of__u32 gives:

    |31.....0|63....31|95....64|127...96|

    while numbering the bits in an array of longs gives:

    |63..............0|127............64|

    64-bit userspace can handle this by just treating method_mask[] as an
    array of longs, but 32-bit userspace is really stuck: the meaning of
    the bits in method_mask[] depends on whether the kernel is 32-bit or
    64-bit, and there's no sane way for userspace to know that.

    Fix this by updating to make it clear that
    method_mask[] is an array of longs, and using a compat_ioctl method to
    convert to an array of 64-bit longs to handle the 32-on-64 problem.
    This fixes the interface description to match existing behavior (so
    working binaries continue to work) in almost all situations, and gives
    consistent semantics in the case of 32-bit userspace that can run on
    either a 32-bit or 64-bit kernel, so that the same binary can work for
    both 32-on-32 and 32-on-64 systems.

    Signed-off-by: Roland Dreier

    Roland Dreier
     
  • Add support for setting the P_Key index of sent MADs and getting the
    P_Key index of received MADs. This requires a change to the layout of
    the ABI structure struct ib_user_mad_hdr, so to avoid breaking
    compatibility, we default to the old (unchanged) ABI and add a new
    ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
    of the new ABI to opt into using it.

    We plan on switching to the new ABI by default in a year or so, and
    this patch adds a warning that is printed when an application uses the
    old ABI, to push people towards converting to the new ABI.

    Signed-off-by: Roland Dreier
    Reviewed-by: Sean Hefty
    Reviewed-by: Hal Rosenstock

    Roland Dreier
     
  • During ib_umem_get(), determine whether all pages from the memory
    region are hugetlb pages and report this in the "hugetlb" member.
    Low-level drivers can use this information if they need it.

    Signed-off-by: Joachim Fenkes
    Signed-off-by: Roland Dreier

    Joachim Fenkes
     
  • Export the ability to set the type of service to user space. Model
    the interface after setsockopt.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Provide support to specify a type of service for a communication
    identifier. A new function call is used when dealing with IPv4
    addresses. For IPv6 addresses, the ToS is specified through the
    traffic class field in the sockaddr_in6 structure.

    Signed-off-by: Sean Hefty

    [ The comments Eitan Zahavi and myself have made over the v1 post at

    were fully addressed. ]

    Reviewed-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • The QoS annex defines new fields for path records. Add them to the
    ib_sa for consumers that want to use them.

    Signed-off-by: Sean Hefty
    Reviewed-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Sean Hefty
     

04 Aug, 2007

2 commits