23 Mar, 2016

1 commit

  • Pull more rdma updates from Doug Ledford:
    "Round two of 4.6 merge window patches.

    This is a monster pull request. I held off on the hfi1 driver updates
    (the hfi1 driver is intimately tied to the qib driver and the new
    rdmavt software library that was created to help both of them) in my
    first pull request. The hfi1/qib/rdmavt update is probably 90% of
    this pull request. The hfi1 driver is being left in staging so that
    it can be fixed up in regards to the API that Al and yourself didn't
    like. Intel has agreed to do the work, but in the meantime, this
    clears out 300+ patches in the backlog queue and brings my tree and
    their tree closer to sync.

    This also includes about 10 patches to the core and a few to mlx5 to
    create an infrastructure for configuring SRIOV ports on IB devices.
    That series includes one patch to the net core that we sent to netdev@
    and Dave Miller with each of the three revisions to the series. We
    didn't get any response to the patch, so we took that as implicit
    approval.

    Finally, this series includes Intel's new iWARP driver for their x722
    cards. It's not nearly the beast as the hfi1 driver. It also has a
    linux-next merge issue, but that has been resolved and it now passes
    just fine.

    Summary:

    - A few minor core fixups needed for the next patch series

    - The IB SRIOV series. This has bounced around for several versions.
    Of note is the fact that the first patch in this series effects the
    net core. It was directed to netdev and DaveM for each iteration
    of the series (three versions total). Dave did not object, but did
    not respond either. I've taken this as permission to move forward
    with the series.

    - The new Intel X722 iWARP driver

    - A huge set of updates to the Intel hfi1 driver. Of particular
    interest here is that we have left the driver in staging since it
    still has an API that people object to. Intel is working on a fix,
    but getting these patches in now helps keep me sane as the upstream
    and Intel's trees were over 300 patches apart"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
    IB/ipoib: Allow mcast packets from other VFs
    IB/mlx5: Implement callbacks for manipulating VFs
    net/mlx5_core: Implement modify HCA vport command
    net/mlx5_core: Add VF param when querying vport counter
    IB/ipoib: Add ndo operations for configuring VFs
    IB/core: Add interfaces to control VF attributes
    IB/core: Support accessing SA in virtualized environment
    IB/core: Add subnet prefix to port info
    IB/mlx5: Fix decision on using MAD_IFC
    net/core: Add support for configuring VF GUIDs
    IB/{core, ulp} Support above 32 possible device capability flags
    IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
    net/mlx5_core: Introduce offload arithmetic hardware capabilities
    net/mlx5_core: Refactor device capability function
    net/mlx5_core: Fix caching ATOMIC endian mode capability
    ib_srpt: fix a WARN_ON() message
    i40iw: Replace the obsolete crypto hash interface with shash
    IB/hfi1: Add SDMA cache eviction algorithm
    IB/hfi1: Switch to using the pin query function
    IB/hfi1: Specify mm when releasing pages
    ...

    Linus Torvalds
     

22 Mar, 2016

5 commits

  • Doug Ledford
     
  • Following the practice exercised for network devices which allow the PF
    net device to configure attributes of its virtual functions, we
    introduce the following functions to be used by IPoIB which is the
    network driver implementation for IB devices.

    ib_set_vf_link_state - set the policy for a VF link. More below.
    ib_get_vf_config - read configuration information of a VF
    ib_get_vf_stats - read VF statistics
    ib_set_vf_guid - set the node or port GUID of a VF

    Also add an indication in the device cap flags that indicates that this
    IB devices is based on a virtual function.

    A VF shares the physical port with the PF and other VFs. When setting
    the link state we have three options:

    1. Auto - in this mode, the virtual port follows the state of the
    physical port and becomes active only if the physical port's state is
    active. In all other cases it remains in a Down state.
    2. Down - sets the state of the virtual port to Down
    3. Up - causes the virtual port to transition into Initialize state if
    it was not already in this state. A virtualization aware subnet manager
    can then bring the state of the port into the Active state.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • Per the ongoing standardisation process, when virtual HCAs are present
    in a network, traffic is routed based on a destination GID. In order to
    access the SA we use the well known SA GID.

    We also add a GRH required boolean field to the port attributes which is
    used to report to the verbs consumer whether this port is connected to a
    virtual network. We use this field to realize whether we need to create
    an address vector with GRH to access the subnet administrator. We clear
    the port attributes struct before calling the hardware driver to make
    sure the default remains that GRH is not required.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • The subnet prefix is a part of the port_info MAD returned and should be
    available at the ib_port_attr struct. We define it here and provide a
    default implementation in case the hardware driver does not provide one.
    The subnet prefix is required when creating the address vector to access
    the SA in networks where GRH must be used.

    Signed-off-by: Eli Cohen
    Reviewed-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Eli Cohen
     
  • The old bitwise device_cap_flags variable was limited to u32 which
    has all bits already defined. In order to overcome it, we converted
    device_cap_flags variable to be u64 type.

    Signed-off-by: Leon Romanovsky
    Reviewed-by: Matan Barak
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

20 Mar, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit


17 Mar, 2016

3 commits


15 Mar, 2016

1 commit


11 Mar, 2016

28 commits

  • The change requires a new pio_busy field in the iowait structure to
    track the number of outstanding pios. The new counter together
    with the sdma counter serve as the basis for a packet by packet decision
    as to which egress mechanism to use. Since packets given to different
    egress mechanisms are not ordered, this scheme will preserve the order.

    The iowait drain/wait mechanisms are extended for a pio case. An
    additional qp wait flag is added for the PIO drain wait case.

    Currently the only pio wait is for buffers, so the no_bufs_available()
    routine name is changed to pio_wait() and a third argument is passed
    with one of the two pio wait flags to generalize the routine. A module
    parameter is added to hold a configurable threshold. For now, the
    module parameter is zero.

    A heuristic routine is added to return the func pointer of the proper
    egress routine to use.

    The heuristic is as follows:
    - SMI always uses pio
    - GSI,UD qps threadhold use sdma
    o No coordination with sdma is required because order is not required
    and this qp pio count is not maintained for UD
    - RC/UC ONLY packets threshold use SDMA
    o If pio's are pending the pio_wait with the new wait flag is
    called to delay for pios to drain

    The threshold is potentially reduced by the QP's mtu.

    The sc_buffer_alloc() has two additional args (a callback, a void *)
    which are exploited by the RC/UC cases to pass a new complete routine
    and a qp *.

    When the shadow ring completes the credit associated with a packet,
    the new complete routine is called. The verbs_pio_complete() will then
    decrement the busy count and trigger any drain waiters in qp destroy
    or reset.

    Reviewed-by: Jubin John
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • pahole noted the wasted 4 bytes after s_lock and r_lock.

    Move s_flags and r_psn to fill the holes.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • Remove exported functions which are no longer required as the
    functionality has moved into rdmavt. This also requires re-ordering some
    of the functions since their prototype no longer appears in a header
    file. Rather than add forward declarations it is just cleaner to
    re-order some of the functions.

    Reviewed-by: Jubin John
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Initially it was intended that rdmavt would support some signaling
    between the underlying driver and itself. However this turned out to be
    unnecessary for qib and hfi1. If we need to add something like this in
    later to support another driver we should do it then. As of now this
    essentially dead code so remove it.

    Reviewed-by: Ira Weiny
    Reviewed-by: Jubin John
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • While hfi1 and qib were still supporting bits and pieces of core verbs
    components there needed to be a way to convey if rdmavt should handle
    allocation and initialize of resources like the queue pair table. Now
    that all of this is moved into rdmavt there is no need for these flags.
    They are no longer used in the drivers.

    Reviewed-by: Ira Weiny
    Reviewed-by: Jubin John
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Rdmavt adopted an smi_ah from qib which is not needed by hfi1. Move this
    back to qib and get it out of the common library.

    Reviewed-by: Ira Weiny
    Reviewed-by: Jubin John
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • For each verb validate that all requirements for driver callbacks are met.
    If a function is called without checking for a valid pointer, it is a
    required function. Also document what each callback function does.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Add, remove, and otherwise clean up existing comments that are leftover
    from the initial code postings of rdmavt. Many of the comments were added
    to provide an idea on the direction we were thinking of going. Now that the
    design is solidified make a pass over and clean everything up. Also add
    details where lacking.

    Ensure all non static functions have nano comments.

    Reviewed-by: Jubin John
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • This patch adds an additional lock to reduce contention on the s_lock.

    This lock is used in post_send() so that the post_send is not
    serialized with the send engine and other send related processing.

    To do this the s_next_psn is now maintained on post_send() while
    post_send() related fields are moved to a new cache line. There is
    an s_avail maintained for the post_send() to mitigate trading cache
    lines with the send engine. The lock is released/acquired around
    releasing the just built packet to the egress mechanism.

    Reviewed-by: Jubin John
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Dean Luick
    Signed-off-by: Harish Chegondi
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • A busy_jiffies variable is maintained and updated when rc qps are
    created and deleted. busy_jiffies is a scaled value of the number
    of rc qps in the device. busy_jiffies is incremented every rc qp
    scaling interval. busy_jiffies is added to the rc timeout
    in add_retry_timer and mod_retry_timer. The rc qp scaling interval
    is selected based on extensive performance evaluation of targeted
    workloads.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Vennila Megavannan
    Signed-off-by: Jubin John
    Signed-off-by: Doug Ledford

    Vennila Megavannan
     
  • The field is a vestige from ipath.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • LinkDownReason LocalMediaNotInstalled lacked an underscore
    and was inconsistent with other defines in the same family.
    This patch fixes this.

    Reviewed-by: Ira Weiny
    Signed-off-by: Easwar Hariharan
    Signed-off-by: Doug Ledford

    Easwar Hariharan
     
  • rvt_query_port calls into the driver through a call back function
    query_port_state to populate the rest of ib_port_attr elements.
    rvt_modify_port calls into the driver if needed through a call back
    function shut_down_port()

    Signed-off-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Harish Chegondi
     
  • Addin query gid support. Rdmavt still relies on the driver to maintain
    the gid table. Rdmavt simply calls into the driver to retrive the guid
    for a particular port.

    Reviewed-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • IB core uses 1 relative indexing for ports. All of our data structures
    use 0 based indexing. Add an inline function that we can use whenever we
    need to validate a legal value and try to convert a port number to a
    port index at the entrance into rdmavt.

    Try to follow the policy that when we are talking about a port from IB
    core point of view we refer to it as a port number. When port is an
    index into our arrays refer to it as a port index.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Some hardware drivers requires additional checks on send WRs. Create an
    optional call back to allow hardware drivers to reject a send WR.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Ira Weiny
     
  • Fill in srq function stubs with code derived from hfi1 and qib.
    Move necessary functions and data structure members as well.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Harish Chegondi
    Signed-off-by: Jubin John
    Signed-off-by: Doug Ledford

    Jubin John
     
  • Update all files added by rdmavt which do not yet have 2016 as the
    copyright year.

    Reviewed-by: Ira Weiny
    Reviewed-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • This patch adds mad agent create and free to rdmavt.

    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • This patch adds rdmavt device structure allocation in rdamvt. The
    ib_device alloc is now done in rdmavt instead of the driver. Drivers
    need to tell rdmavt the number of ports when calling.

    A side of effect of this patch is fixing a bug with port initialization
    where the device structure port array was allocated over top of an
    existing one.

    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Low level drivers need to be able to check incoming attributes as well as be
    able to adjust their private data on queue pair modification. Add 2 driver
    callbacks, check_modify_qp and modify_qp, to facilitate this.

    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Ira Weiny
     
  • s_sde should be in the low level driver QP private data.

    Remove the definition from rvt_qp.

    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Ira Weiny
     
  • This patch adds in the multicast add and remove functions as well as the
    ancillary infrastructure needed.

    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Add modify qp and supporting functions.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Add in a post_send and post_one_send to rdmavt. The ULP will provide a WQE
    to rdmavt which will then walk and queue each element. Rdmavt will then
    queue the work to be done in the driver or kick the driver's progress
    routine.

    There needs to be a follow on patch which adds in another lock for the
    head of the queue so that it can be added to and read from in parallel.
    This will touch protocol handlers and require other changes in the
    drivers. This will be done separately.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • Brings in completion queue functionality. A kthread worker is added to
    the rvt_dev_info to serve as a worker for completion queues.

    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Harish Chegondi
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Dennis Dalessandro
     
  • The current code is problematic when the QP creation and ipoib is
    used to support NFS and NFS desires to do IO for paging purposes.
    In that case, the GFP_KERNEL allocation within create_qp causes
    a deadlock in tight memory situations.

    This fix adds support to create queue pair with GFP_NOIO flag for
    connected mode only to cleanly fail the create queue pair in those
    situations.

    This was previously fixed in qib but needed to get ported to hfi1.
    This patch handles that for both hardwares in the new rdmavt common
    layer.

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • With this commit, the drivers using rdmavt need not define query_device
    function. But they should fill in the IB device attributes structure
    rvt_dev_info.dparms.props

    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Harish Chegondi
    Signed-off-by: Doug Ledford

    Harish Chegondi