08 Nov, 2013

2 commits

  • For each RX/TX ring and its CQ, allocation is done on a NUMA node that
    corresponds to the core that the data structure should operate on.
    The assumption is that the core number is reflected by the ring index.
    The affected allocations are the ring/CQ data structures,
    the TX/RX info and the shared HW/SW buffer.
    For TX rings, each core has rings of all UPs.

    Signed-off-by: Yevgeny Petrilin
    Signed-off-by: Eugenia Emantayev
    Reviewed-by: Hadar Hen Zion
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eugenia Emantayev
     
  • This is done to optimize FW/HW access to host memory.

    Signed-off-by: Yevgeny Petrilin
    Signed-off-by: Eugenia Emantayev
    Reviewed-by: Hadar Hen Zion
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eugenia Emantayev
     

05 Nov, 2013

3 commits

  • This is step #1 for implementing SRIOV resource quotas for VFs.

    Quotas are implemented per resource type for VFs and the PF, to prevent
    any entity from simply grabbing all the resources for itself and leaving
    the other entities unable to obtain such resources.

    Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC,
    VLAN, and Counters.

    The quota system works as follows:
    Each entity (VF or PF) is given a max number of a given resource (its quota),
    and a guaranteed minimum number for each resource (starvation prevention).

    For QPs, CQs, SRQs, MPTs and MTTs:
    50% of the available quantity for the resource is divided equally among
    the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
    parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
    The other 50% is the "free pool", allocated on a first-come-first-serve basis.
    For each VF/PF, resources are first allocated from its "guaranteed-minimum"
    pool. When that pool is exhausted, the driver attempts to allocate from
    the resource "free-pool".

    The quota (i.e., max) for the VFs and the PF is:
    The free-pool amount (50% of the real max) + the guaranteed minimum

    For MACs:
    Guarantee 2 MACs per VF/PF per port. As a result, since we have only
    128 MACs per port, reduce the allowable number of VFs from 64 to 63.
    Any remaining MACs are put into a free pool.

    For VLANs:
    For the PF, the per-port quota is 128 and guarantee is 64
    (to allow the PF to register at least a VLAN per VF in VST mode).
    For the VFs, the per-port quota is 64 and the guarantee is 0.
    We assume that VGT VFs are trusted not to abuse the VLAN resource.

    For Counters:
    For all functions (PF and VFs), the quota is 128 and the guarantee is 0.

    In this patch, we define the needed structures, which are added to the
    resource-tracker struct. In addition, we do initialization
    for the resource quota, and adjust the query_device response to use quotas
    rather than resource maxima.

    As part of the implementation, we introduce a new field in
    mlx4_dev: quotas. This field holds the resource quotas used
    to report maxima to the upper layers (ib_core, via query_device).

    The HCA maxima of these values are passed to the VFs (via
    QUERY_HCA) so that they may continue to use these in handling
    QPs, CQs, SRQs and MPTs.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Jack Morgenstein
     
  • Use of vlan_index created problems unregistering vlans on guests.

    In addition, tools delete vlan by tag, not by index, lets follow that.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Jack Morgenstein
     
  • The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac,
    mlx4_unregister_mac all made illegal use of the out_param in multifunc mode
    to pass the port number. The firmware spec specifies that the port number
    should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and
    FREE_RES (sections 20.15.1 and 20.15.2).

    For MAC register/unregister, this patch contains workarounds so that guests
    running previous kernels continue to work on a new Hypervisor, and guests
    running the new kernel will continue to work on old hypervisors.

    Vlan registeration capability is still not operational in multifunction mode,
    since the vlan wrapper functions are not implemented in this patch.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Jack Morgenstein
     

18 Oct, 2013

1 commit

  • Small code cleanup:

    1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN

    2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
    other MLX4_SET_PORT_yyy

    Signed-off-by: Or Gerlitz
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Or Gerlitz
     

06 Sep, 2013

1 commit

  • Pull networking changes from David Miller:
    "Noteworthy changes this time around:

    1) Multicast rejoin support for team driver, from Jiri Pirko.

    2) Centralize and simplify TCP RTT measurement handling in order to
    reduce the impact of bad RTO seeding from SYN/ACKs. Also, when
    both timestamps and local RTT measurements are available prefer
    the later because there are broken middleware devices which
    scramble the timestamp.

    From Yuchung Cheng.

    3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
    memory consumed to queue up unsend user data. From Eric Dumazet.

    4) Add a "physical port ID" abstraction for network devices, from
    Jiri Pirko.

    5) Add a "suppress" operation to influence fib_rules lookups, from
    Stefan Tomanek.

    6) Add a networking development FAQ, from Paul Gortmaker.

    7) Extend the information provided by tcp_probe and add ipv6 support,
    from Daniel Borkmann.

    8) Use RCU locking more extensively in openvswitch data paths, from
    Pravin B Shelar.

    9) Add SCTP support to openvswitch, from Joe Stringer.

    10) Add EF10 chip support to SFC driver, from Ben Hutchings.

    11) Add new SYNPROXY netfilter target, from Patrick McHardy.

    12) Compute a rate approximation for sending in TCP sockets, and use
    this to more intelligently coalesce TSO frames. Furthermore, add
    a new packet scheduler which takes advantage of this estimate when
    available. From Eric Dumazet.

    13) Allow AF_PACKET fanouts with random selection, from Daniel
    Borkmann.

    14) Add ipv6 support to vxlan driver, from Cong Wang"

    Resolved conflicts as per discussion.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
    openvswitch: Fix alignment of struct sw_flow_key.
    netfilter: Fix build errors with xt_socket.c
    tcp: Add missing braces to do_tcp_setsockopt
    caif: Add missing braces to multiline if in cfctrl_linkup_request
    bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
    vxlan: Fix kernel panic on device delete.
    net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
    net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
    icplus: Use netif_running to determine device state
    ethernet/arc/arc_emac: Fix huge delays in large file copies
    tuntap: orphan frags before trying to set tx timestamp
    tuntap: purge socket error queue on detach
    qlcnic: use standard NAPI weights
    ipv6:introduce function to find route for redirect
    bnx2x: VF RSS support - VF side
    bnx2x: VF RSS support - PF side
    vxlan: Notify drivers for listening UDP port changes
    net: usbnet: update addr_assign_type if appropriate
    driver/net: enic: update enic maintainers and driver
    driver/net: enic: Exposing symbols for Cisco's low latency driver
    ...

    Linus Torvalds
     

29 Aug, 2013

1 commit

  • Implement ib_create_flow() and ib_destroy_flow().

    Translate the verbs structures provided by the user to HW structures
    and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.

    On the ATTACH command completion, the firmware provides a 64-bit
    registration ID, which is placed into struct mlx4_ib_flow that wraps
    the instance of struct ib_flow which is retuned to caller. Later,
    this reg ID is used for detaching that flow from the firmware.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     

03 Aug, 2013

1 commit


29 Jul, 2013

1 commit

  • This commit adds new firmware command and new firmware event. The firmware
    raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
    needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
    command. At the moment the supported operation is adding/removing multicast
    entries which are used by the firmware for handling NCSI traffic in B0
    steering mode.

    Also, had to swap the order of mlx4_init_mcg_table() and
    mlx4_init_eq_table() to make sure that driver will get events only after
    resources are initialized to handle it.

    Signed-off-by: Yevgeny Petrilin
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Eugenia Emantayev
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Yevgeny Petrilin
     

02 Jul, 2013

2 commits

  • When the firmware supports the UPDATE_QP command, if the VF link is disabled,
    block all QPs opened by the VF, by programming the UPDATE_QP command to drop
    all RX & TX traffic to/from these QPs. Operates only in VST mode.

    Signed-off-by: Rony Efraim
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Rony Efraim
     
  • Within VST mode, enable modifying the vlan and/or qos
    for a VF without requiring unbind/rebind.

    This requires firmware which supports the UPDATE_QP command.
    (If the command is not available, we fall back to requiring
    unbind/bind to activate these changes).

    To avoid race conditions with modify-qp on QPs that are affected
    by update-qp, this operation is performed on the comm_wq.

    If the update operation succeeds for all the necessary QPs, a
    vlan_unregister is performed for the abandoned vlan id.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Jack Morgenstein
     

14 Jun, 2013

1 commit


12 May, 2013

1 commit

  • Make sure that the following steps are taken:

    - drop packets sent by the VF with vlan tag
    - block packets with vlan tag which are steered to the VF
    - drop/block tagged packets when the policy is priority-tagged
    - make sure VLAN stripping for received packets is set
    - make sure force UP bit for the VF QP is set

    Use enum values for all the above instead of numerical bit offsets.

    Signed-off-by: Rony Efraim
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Rony Efraim
     

09 May, 2013

1 commit

  • Pull InfiniBand/RDMA changes from Roland Dreier:
    - XRC transport fixes
    - Fix DHCP on IPoIB
    - mlx4 preparations for flow steering
    - iSER fixes
    - miscellaneous other fixes

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
    IB/iser: Add support for iser CM REQ additional info
    IB/iser: Return error to upper layers on EAGAIN registration failures
    IB/iser: Move informational messages from error to info level
    IB/iser: Add module version
    mlx4_core: Expose a few helpers to fill DMFS HW strucutures
    mlx4_core: Directly expose fields of DMFS HW rule control segment
    mlx4_core: Change a few DMFS fields names to match firmare spec
    mlx4: Match DMFS promiscuous field names to firmware spec
    mlx4_core: Move DMFS HW structs to common header file
    IB/mlx4: Set link type for RAW PACKET QPs in the QP context
    IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
    mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
    RDMA/iwcm: Don't touch cmid after dropping reference
    IB/qib: Correct qib_verbs_register_sysfs() error handling
    IB/ipath: Correct ipath_verbs_register_sysfs() error handling
    RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
    SRPT: Fix odd use of WARN_ON()
    IPoIB: Fix ipoib_hard_header() return value
    RDMA: Rename random32() to prandom_u32()
    RDMA/cxgb3: Fix uninitialized variable
    ...

    Linus Torvalds
     

27 Apr, 2013

4 commits


25 Apr, 2013

8 commits

  • Re-arrange some of code which fills DMFS HW structures so we can use
    it from within the core driver and from the IB driver too, e.g when
    verbs DMFS structures are transformed into mlx4 hardware structs.

    Also, add struct mlx4_flow_handle struct which will be of use by the
    DMFS verbs flow in the IB driver.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • Some of struct mlx4_net_trans_rule_hw_ctrl fields were packed into u32
    and accessed through bit field operations. Expose and access them
    directly as u8 or u16.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • Change struct mlx4_net_trans_rule_hw_eth :: vlan_id name to vlan_tag

    Change struct mlx4_net_trans_rule_hw_ib :: r_u_qpn name to l3_qpn

    The patch doesn't introduce any functional change or API change
    towards the firmware.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • Align the names used by enum mlx4_net_trans_promisc_mode with the
    actual firmware specification. The patch doesn't introduce any
    functional change or API change towards the firmware.

    Remove MLX4_FS_PROMISC_FUNCTION_PORT which isn't of use. Add new
    enums MLX4_FS_{UC/MC}_SNIFFER as a preparation step for sniffer
    support.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • Move flow steering HW structures to be on the public mlx4 include
    directory, as a pre-step for the mlx4 IB driver to use them too.

    Signed-off-by: Hadar Hen Zion
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Hadar Hen Zion
     
  • The patch allows to enable/disable HW timestamping for incoming and/or
    outgoing packets. It adds and initializes all structs and callbacks
    needed by kernel TS API.
    To enable/disable HW timestamping appropriate ioctl should be used.
    Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are
    supported.
    When enabling TS on receive flow - VLAN stripping will be disabled.
    Also were made all relevant changes in RX/TX flows to consider TS request
    and plant HW timestamps into relevant structures.
    mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.

    Signed-off-by: Eugenia Emantayev
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Read HCA frequency, read PCI clock bar and offset, map internal clock to
    PCI bar.

    Signed-off-by: Eugenia Emantayev
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eugenia Emantayev
     
  • Add new device capability for timestamping support and query FW to retrieve it.

    Signed-off-by: Eugenia Emantayev
    Signed-off-by: Or Gerlitz
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eugenia Emantayev
     

17 Apr, 2013

1 commit

  • Expose a new API mlx4_srq_lookup() to retrive a SRQ based on its
    number. This API is needed in the mlx4_ib driver CQ polling logic,
    when a work completion is associated with a XRC TGT QP. Since a
    target QP may redirect to more than one XRC SRQ, the srq field in the
    QP has no usage and the real XRC SRQ need to be retrived using the
    information from the XRCETH IB header which is placed in the HW CQE.

    Signed-off-by: Shlomo Pongratz
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Shlomo Pongratz
     

08 Apr, 2013

1 commit


27 Feb, 2013

1 commit

  • Pull infiniband update from Roland Dreier:
    "Main batch of InfiniBand/RDMA changes for 3.9:

    - SRP error handling fixes from Bart Van Assche

    - Implementation of memory windows for mlx4 from Shani Michaeli

    - Lots of cxgb4 HW driver fixes from Vipul Pandya

    - Make iSER work for virtual functions, other fixes from Or Gerlitz

    - Fix for bug in qib HW driver from Mike Marciniszyn

    - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman

    - Various cleanups and warning fixes from Julia Lawall, Paul Bolle,
    Wei Yongjun"

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits)
    IB/mlx4: Advertise MW support
    IB/mlx4: Support memory window binding
    mlx4: Implement memory windows allocation and deallocation
    mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
    mlx4_core: Disable memory windows for virtual functions
    IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
    IB/srp: Fail I/O requests if the transport is offline
    IB/srp: Avoid endless SCSI error handling loop
    IB/srp: Avoid sending a task management function needlessly
    IB/srp: Track connection state properly
    IB/mlx4: Remove redundant NULL check before kfree
    IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
    IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
    IB/iser: Enable iser when FMRs are not supported
    IB/iser: Avoid error prints on EAGAIN registration failures
    IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
    IB/uverbs: Implement memory windows support in uverbs
    IB/core: Add "type 2" memory windows support
    mlx4_core: Propagate MR deregistration failures to caller
    mlx4_core: Rename MPT-related functions to have mpt_ prefix
    ...

    Linus Torvalds
     

26 Feb, 2013

2 commits

  • * Implement memory windows binding in mlx4_ib_post_send.

    * Implement mlx4_ib_bind_mw by deferring to mlx4_ib_post_send.

    * Rename MLX4_WQE_FMR_PERM_* flags to MLX4_WQE_FMR_AND_BIND_PERM_*,
    indicating that they are used both for fast registration work
    requests, and for memory window bind work requests.

    Signed-off-by: Haggai Eran
    Signed-off-by: Shani Michaeli
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Shani Michaeli
     
  • Implement MW allocation and deallocation in mlx4_core and mlx4_ib.
    Pass down the enable bind flag when registering memory regions.

    Signed-off-by: Haggai Eran
    Signed-off-by: Shani Michaeli
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Shani Michaeli
     

22 Feb, 2013

2 commits


08 Feb, 2013

1 commit

  • Move low level code that deals with management of Ethernet MACs and QPs from mlx4_core to mlx4_en.
    Also convert the new functions to deal with MACs in form of char array instead of u64.

    Actual functions moved:
    mlx4_replace_mac
    mlx4_get_eth_qp
    mlx4_put_eth_qp

    To conduct this change, some functionality had to be exported from the core,
    the following functions were added:
    mlx4_get_base_qp
    __mlx4_replace_mac (low level function for CX1/A0 compatibility)

    Signed-off-by: Yan Burman
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Yan Burman
     

01 Feb, 2013

1 commit


20 Dec, 2012

1 commit

  • Device managed flow steering will be enabled only under administrator
    directive provided through setting the existing module parameter
    log_num_mgm_entry_size to -1 (if the device actually supports flow
    steering). If flow steering isn't requested or not available, the
    driver will use the value of log_num_mgm_entry_size and B0 steering.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     

27 Nov, 2012

1 commit

  • ConnectX-3 devices can use either 64- or 32-byte completion queue
    entries (CQEs) and event queue entries (EQEs). Using 64-byte
    EQEs/CQEs performs better because each entry is aligned to a complete
    cacheline. This patch queries the HCA's capabilities, and if it
    supports 64-byte CQEs and EQES the driver will configure the HW to
    work in 64-byte mode.

    The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

    Since this mode is global, userspace (libmlx4) must be updated to work
    with the configured CQE size, and guests using SR-IOV virtual
    functions need to know both EQE and CQE size.

    In case one of the 64-byte CQE/EQE capabilities is activated, the
    patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
    command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
    they need an update to be able to work with the PPF. This is done by
    changing the returned pf_context_behaviour not to be zero any more. In
    case none of these capabilities is activated that value remains zero
    and older guest drivers can run OK.

    The SRIOV related flow is as follows

    1. the PPF does the detection of the new capabilities using
    QUERY_DEV_CAP command.

    2. the PPF activates the new capabilities using INIT_HCA.

    3. the VF detects if the PPF activated the capabilities using
    QUERY_HCA, and if this is the case activates them for itself too.

    Note that the VF detects that it must be aware to the new PF behaviour
    using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode.

    User space notification is done through a new field introduced in
    struct mlx4_ib_ucontext which holds device capabilities for which user
    space must take action. This changes the binary interface so the ABI
    towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
    when **needed** i.e. only when the driver does use 64-byte CQEs or
    future device capabilities which must be in sync by user space. This
    practice allows to work with unmodified libmlx4 on older devices (e.g
    A0, B0) which don't support 64-byte CQEs.

    In order to keep existing systems functional when they update to a
    newer kernel that contains these changes in VF and userspace ABI, a
    module parameter enable_64b_cqe_eqe must be set to enable 64-byte
    mode; the default is currently false.

    Signed-off-by: Eli Cohen
    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     

01 Oct, 2012

2 commits

  • Previously, the structure of a guest's proxy QPs followed the
    structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
    qp1 port 2, ...). The guest then did offset calculations on the
    sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().

    This is now changed so that the guest does no offset calculations
    regarding proxy or tunnel QPs to use. This change frees the PPF from
    needing to adhere to a specific order in allocating proxy and tunnel
    QPs.

    Now QUERY_FUNC_CAP provides each port individually with its proxy
    qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
    used directly where required (with no offset calculations).

    To accomplish this change, several fields were added to the phys_caps
    structure for use by the PPF and by non-SR-IOV mode:

    base_sqpn -- in non-sriov mode, this was formerly sqp_start.
    base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
    base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.

    The current code in the PPF still adheres to the previous layout of
    sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
    layout without affecting VF or (paravirtualized) PF code.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein
     
  • This is necessary in order to support > 1 VF/PF in a VM for software
    that uses the node guid as a discriminator, such as librdmacm.

    Signed-off-by: Jack Morgenstein
    Signed-off-by: Roland Dreier

    Jack Morgenstein