05 Jan, 2012

1 commit


02 Nov, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
    mlx4_core: Deprecate log_num_vlan module param
    IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
    IB/mlx4: Enable 4K mtu for IBoE
    RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
    RDMA/cxgb4: Serialize calls to CQ's comp_handler
    RDMA/cxgb3: Serialize calls to CQ's comp_handler
    IB/qib: Fix issue with link states and QSFP cables
    IB/mlx4: Configure extended active speeds
    mlx4_core: Add extended port capabilities support
    IB/qib: Hold links until tuning data is available
    IB/qib: Clean up checkpatch issue
    IB/qib: Remove s_lock around header validation
    IB/qib: Precompute timeout jiffies to optimize latency
    IB/qib: Use RCU for qpn lookup
    IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
    IB/qib: Decode path MTU optimization
    IB/qib: Optimize RC/UC code by IB operation
    IPoIB: Use the right function to do DMA unmap pages
    RDMA/cxgb4: Use correct QID in insert_recv_cqe()
    RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
    ...

    Linus Torvalds
     
  • …sc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next

    Roland Dreier
     

14 Oct, 2011

10 commits

  • Allow processes that share the same XRC domain to open an existing
    shareable QP. This permits those processes to receive events on the
    shared QP and transfer ownership, so that any process may modify the
    QP. The latter allows the creating process to exit, while a remaining
    process can still transition it for path migration purposes.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC TGT QPs are shared resources among multiple processes. Since the
    creating process may exit, allow other processes which share the same
    XRC domain to open an existing QP. This allows us to transfer
    ownership of an XRC TGT QP to another process.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow the user to indicate the QP type separately from the port space
    when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a
    1:1 relationship between the QP type and port space, so we need to
    switch on the QP type to select between UD and connected QPs.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Add RDMA_PS_IB. XRC QP types will use the IB port space when operating
    over the RDMA CM. For the 'IP protocol' field value, we select 0x3F,
    which is listed as being for 'any local network'.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • We require additional information to create XRC SRQs than we can
    exchange using the existing create SRQ ABI. Provide an enhanced create
    ABI for extended SRQ types.

    Based on patches by Jack Morgenstein
    and Roland Dreier

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Allow user space to create XRC domains. Because XRCDs are expected to
    be shared among multiple processes, we use inodes to identify an XRCD.

    Based on patches by Jack Morgenstein

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC TGT QPs are intended to be shared among multiple users and
    processes. Allow the destruction of an XRC TGT QP to be done explicitly
    through ib_destroy_qp() or when the XRCD is destroyed.

    To support destroying an XRC TGT QP, we need to track TGT QPs with the
    XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also
    cleaned up.

    To avoid stale reference issues, if a user is holding a reference on a
    TGT QP, we increment a reference count on the QP. The user releases the
    reference by calling ib_release_qp. This releases any access to the QP
    from a user above verbs, but allows the QP to continue to exist until
    destroyed by the XRCD.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC ("eXtended reliable connected") is an IB transport that provides
    better scalability by allowing senders to specify which shared receive
    queue (SRQ) should be used to receive a message, which essentially
    allows one transport context (QP connection) to serve multiple
    destinations (as long as they share an adapter, of course).

    XRC communication is between an initiator (INI) QP and a target (TGT)
    QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP
    behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC
    QPs, except that work requests posted to an XRC INI QP must specify the
    remote SRQ that is the target of the work request.

    We define two new QP types for XRC, to distinguish between INI and TGT
    QPs, and update the core layer to support XRC QPs.

    This patch is derived from work by Jack Morgenstein

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • XRC ("eXtended reliable connected") is an IB transport that provides
    better scalability by allowing senders to specify which shared receive
    queue (SRQ) should be used to receive a message, which essentially
    allows one transport context (QP connection) to serve multiple
    destinations (as long as they share an adapter, of course).

    XRC defines SRQs that are specifically used by XRC connections. Expand
    the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to
    only XRC use according to the IB XRC Annex.

    Portions of this patch were derived from work by
    Jack Morgenstein .

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Currently, there is only a single ("basic") type of SRQ, but with XRC
    support we will add a second. Prepare for this by defining an SRQ type
    and setting all current users to IB_SRQT_BASIC.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

13 Oct, 2011

1 commit

  • XRC ("eXtended reliable connected") is an IB transport that provides
    better scalability by allowing senders to specify which shared receive
    queue (SRQ) should be used to receive a message, which essentially
    allows one transport context (QP connection) to serve multiple
    destinations (as long as they share an adapter, of course).

    A few new concepts are introduced to support this. This patch adds:

    - A new device capability flag, IB_DEVICE_XRC, which low-level
    drivers set to indicate that a device supports XRC.
    - A new object type, XRC domains (struct ib_xrcd), and new verbs
    ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which
    XRC SRQs an incoming message can target.

    This patch is derived from work by Jack Morgenstein .

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     

12 Oct, 2011

1 commit

  • Introduce support for the following extended speeds:

    FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
    64b/66b encoding rather than 8b/10b encoding.
    FDR: IBA extended speed 14.0625 Gbps.
    EDR: IBA extended speed 25.78125 Gbps.

    Signed-off-by: Marcel Apfelbaum
    Reviewed-by: Hal Rosenstock
    Reviewed-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Marcel Apfelbaum
     

07 Oct, 2011

1 commit


16 Sep, 2011

1 commit

  • This patch does several things:
    - introduces __ethtool_get_settings which is called from ethtool code and
    from drivers as well. Put ASSERT_RTNL there.
    - dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
    - changes calling in drivers so rtnl locking is respected. In
    iboe_get_rate was previously ->get_settings() called unlocked. This
    fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
    problem. Also fixed by calling __dev_get_by_index() instead of
    dev_get_by_index() and holding rtnl_lock for both calls.
    - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
    so bnx2fc_if_create() and fcoe_if_create() are called locked as they
    are from other places.
    - use __ethtool_get_settings() in bonding code

    Signed-off-by: Jiri Pirko

    v2->v3:
    -removed dev_ethtool_get_settings()
    -added ASSERT_RTNL into __ethtool_get_settings()
    -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
    around it and __ethtool_get_settings() call
    v1->v2:
    add missing export_symbol
    Reviewed-by: Ben Hutchings [except FCoE bits]
    Acked-by: Ralf Baechle
    Signed-off-by: David S. Miller

    Jiri Pirko
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

19 Jul, 2011

2 commits

  • Move the various definitions and mad structures needed for software
    implementation of IBA PM agent from the ipath and qib drivers into a
    single include file, which in turn could be used by more consumers.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Add IB GID change event type. This is needed for IBoE when the HW
    driver updates the GID (e.g when new VLANs are added/deleted) table
    and the change should be reflected to the IB core cache.

    Signed-off-by: Eli Cohen
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     

27 May, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    RDMA/cma: Save PID of ID's owner
    RDMA/cma: Add support for netlink statistics export
    RDMA/cma: Pass QP type into rdma_create_id()
    RDMA: Update exported headers list
    RDMA/cma: Export enum cma_state in
    RDMA/nes: Add a check for strict_strtoul()
    RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
    RDMA/cxgb4: Use completion objects for event blocking
    IB/srp: Fix integer -> pointer cast warnings
    IB: Add devnode methods to cm_class and umad_class
    IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
    IB/uverbs: Add devnode method to set path/mode
    RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
    RDMA: Add netlink infrastructure
    RDMA: Add error handling to ib_core_init()

    Linus Torvalds
     

26 May, 2011

4 commits

  • Add callbacks and data types for statistics export of all current
    devices/ids. The schema for RDMA CM is a series of netlink messages.
    Each one contains an rdma_cm_stat struct. Additionally, two netlink
    attributes are created for the addresses for each message (if
    applicable).

    Their types used are:
    RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
    RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
    sockaddr_* structs are encapsulated within these attributes.

    In other words, every transaction contains a series of messages like:

    -------message 1-------
    struct rdma_cm_id_stats {
    __u32 qp_num;
    __u32 bound_dev_if;
    __u32 port_space;
    __s32 pid;
    __u8 cm_state;
    __u8 node_type;
    __u8 port_num;
    __u8 reserved;
    }
    RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
    RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
    -------end 1-------
    -------message 2-------
    struct rdma_cm_id_stats
    RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
    RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
    -------end 2-------

    Signed-off-by: Nir Muchtar
    Signed-off-by: Roland Dreier

    Nir Muchtar
     
  • The RDMA CM currently infers the QP type from the port space selected
    by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
    be a 1-1 correspondence between port space and QP type. For netlink
    export of RDMA CM state, we want to export the QP type to userspace,
    so it is cleaner to explicitly associate a QP type to an ID.

    Modify rdma_create_id() to allow the user to specify the QP type, and
    use it to make our selections of datagram versus connected mode.

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Sean Hefty
     
  • Various RDMA headers are intended to be exported to userspace, so add
    them to the headers-y list. Add a (strictly speaking, superfluous)
    include of to avoid a headers_check warning.

    Signed-off-by: Roland Dreier

    Roland Dreier
     
  • Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
    in an exported header so that it can be exported via RDMA netlink.

    Signed-off-by: Nir Muchtar
    Signed-off-by: Roland Dreier

    Nir Muchtar
     

21 May, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     
  • Add basic RDMA netlink infrastructure that allows for registration of
    RDMA clients for which data is to be exported and supplies message
    construction callbacks.

    Signed-off-by: Nir Muchtar

    [ Reorganize a few things, add CONFIG_NET dependency. - Roland ]

    Signed-off-by: Roland Dreier

    Roland Dreier
     

10 May, 2011

2 commits

  • The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
    cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
    drivers -- only nes was using the enum values (with the mild consequence
    that all nes connection failures were treated as generic errors rather
    than reported as timeouts or rejections).

    We can fix this confusion by getting rid of enum iw_cm_event_status and
    using a plain int for struct iw_cm_event.status, and converting nes to
    use -Exxx as the other iWARP drivers do.

    This also gets rid of the warning

    drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
    drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
    drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
    drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'

    Signed-off-by: Roland Dreier
    Reviewed-by: Steve Wise
    Reviewed-by: Sean Hefty
    Reviewed-by: Faisal Latif

    Roland Dreier
     
  • Lustre requires that clients bind to a privileged port number before
    connecting to a remote server. On larger clusters (typically more
    than about 1000 nodes), the number of privileged ports is exhausted,
    resulting in lustre being unusable.

    To handle this, we add support for reusable addresses to the rdma_cm.
    This mimics the behavior of the socket option SO_REUSEADDR. A user
    may set an rdma_cm_id to reuse an address before calling
    rdma_bind_addr() (explicitly or implicitly). If set, other
    rdma_cm_id's may be bound to the same address, provided that they all
    have reuse enabled, and there are no active listens.

    If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
    will only succeed if there are no other id's bound to that same
    address. The reuse option is exported to user space. The behavior of
    the kernel reuse implementation was verified against that given by
    sockets.

    This patch is derived from a path by Ira Weiny

    Signed-off-by: Sean Hefty
    Signed-off-by: Roland Dreier

    Hefty, Sean
     

30 Apr, 2011

1 commit

  • This makes sure that when a driver calls the ethtool's
    get/set_settings() callback of another driver, the data passed to it
    is clean. This guarantees that speed_hi will be zeroed correctly if
    the called callback doesn't explicitely set it: we are sure we don't
    get a corrupted speed from the underlying driver. We also take care of
    setting the cmd field appropriately (ETHTOOL_GSET/SSET).

    This applies to dev_ethtool_get_settings(), which now makes sure it
    sets up that ethtool command parameter correctly before passing it to
    drivers. This also means that whoever calls dev_ethtool_get_settings()
    does not have to clean the ethtool command parameter. This function
    also becomes an exported symbol instead of an inline.

    All drivers visible to make allyesconfig under x86_64 have been
    updated.

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    David Decotigny
     

17 Jan, 2011

1 commit

  • * ib_wq is added, which is used as the common workqueue for infiniband
    instead of the system workqueue. All system workqueue usages
    including flush_scheduled_work() callers are converted to use and
    flush ib_wq.

    * cancel_delayed_work() + flush_scheduled_work() converted to
    cancel_delayed_work_sync().

    * qib_wq is removed and ib_wq is used instead.

    This is to prepare for deprecation of flush_scheduled_work().

    Signed-off-by: Tejun Heo
    Signed-off-by: Roland Dreier

    Tejun Heo
     

26 Oct, 2010

2 commits


15 Oct, 2010

1 commit


14 Oct, 2010

1 commit

  • Add support for IBoE device binding and IP --> GID resolution. Path
    resolving and multicast joining are implemented within cma.c by
    filling in the responses and running callbacks in the CMA work queue.

    IP --> GID resolution always yields IPv6 link local addresses; remote
    GIDs are derived from the destination MAC address of the remote port.
    Multicast GIDs are always mapped to multicast MACs as is done in IPv6.
    (IPv4 multicast is enabled by translating IPv4 multicast addresses to
    IPv6 multicast as described in
    .)

    Some helper functions are added to ib_addr.h.

    Signed-off-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Eli Cohen
     

28 Sep, 2010

1 commit

  • This patch allows ports to have different link layers:
    IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET. This is required
    for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support. For
    devices that do not provide an implementation for querying the link
    layer property of a port, we return a default value based on the
    transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
    and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.

    Signed-off-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Eli Cohen
     

05 Aug, 2010

1 commit

  • Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make
    the special QP type easier to understand.

    cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.html

    Signed-off-by: Aleksey Senin
    Signed-off-by: Roland Dreier

    Aleksey Senin
     

22 May, 2010

1 commit

  • Add a new parameter to ib_register_device() so that low-level device
    drivers can pass in a pointer to a callback function that will be
    called for each port that is registered in sysfs. This allows
    low-level device drivers to create files in

    /sys/class/infiniband//ports//

    without having to poke through the internals of the RDMA sysfs handling.

    There is no need for an unregister function since the kobject
    reference will go to zero when ib_unregister_device() is called.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     

22 Apr, 2010

1 commit

  • - Add new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and IB_WR_MASKED_ATOMIC_FETCH_AND_ADD
    send opcodes that can be used to post "masked atomic compare and
    swap" and "masked atomic fetch and add" work request respectively.
    - Add masked_atomic_cap capability.
    - Add mask fields to atomic struct of ib_send_wr
    - Add new opcodes to ib_wc_opcode

    The new operations are described more precisely below:

    * Masked Compare and Swap (MskCmpSwap)

    The MskCmpSwap atomic operation is an extension to the CmpSwap
    operation defined in the IB spec. MskCmpSwap allows the user to
    select a portion of the 64 bit target data for the “compare” check as
    well as to restrict the swap to a (possibly different) portion. The
    pseudo code below describes the operation:

    | atomic_response = *va
    | if (!((compare_add ^ *va) & compare_add_mask)) then
    | *va = (*va & ~(swap_mask)) | (swap & swap_mask)
    |
    | return atomic_response

    The additional operands are carried in the Extended Transport Header.
    Atomic response generation and packet format for MskCmpSwap is as for
    standard IB Atomic operations.

    * Masked Fetch and Add (MFetchAdd)

    The MFetchAdd Atomic operation extends the functionality of the
    standard IB FetchAdd by allowing the user to split the target into
    multiple fields of selectable length. The atomic add is done
    independently on each one of this fields. A bit set in the
    field_boundary parameter specifies the field boundaries. The pseudo
    code below describes the operation:

    | bit_adder(ci, b1, b2, *co)
    | {
    | value = ci + b1 + b2
    | *co = !!(value & 2)
    |
    | return value & 1
    | }
    |
    | #define MASK_IS_SET(mask, attr) (!!((mask)&(attr)))
    | bit_position = 1
    | carry = 0
    | atomic_response = 0
    |
    | for i = 0 to 63
    | {
    | if ( i != 0 )
    | bit_position = bit_position << 1
    |
    | bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position),
    | MASK_IS_SET(compare_add, bit_position), &new_carry)
    | if (bit_add_res)
    | atomic_response |= bit_position
    |
    | carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, bit_position)))
    | }
    |
    | return atomic_response

    Signed-off-by: Vladimir Sokolovsky
    Signed-off-by: Roland Dreier

    Vladimir Sokolovsky
     

02 Mar, 2010

1 commit