25 Dec, 2016

1 commit


23 Dec, 2016

1 commit

  • Code that dereferences the struct net_device ip_ptr member must be
    protected with an in_dev_get() / in_dev_put() pair. Hence insert
    calls to these functions.

    Fixes: commit 7b85627b9f02 ("IB/cma: IBoE (RoCE) IP-based GID addressing")
    Signed-off-by: Bart Van Assche
    Reviewed-by: Moni Shoua
    Cc: Or Gerlitz
    Cc: Roland Dreier
    Cc:
    Signed-off-by: Doug Ledford

    Bart Van Assche
     

15 Dec, 2016

6 commits


14 Dec, 2016

6 commits

  • Add new member rate_limit to ib_qp_attr which holds the packet pacing rate
    in kbps, 0 means unlimited.

    IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
    QPs when changing QP state from RTR to RTS, RTS to RTS.

    Signed-off-by: Bodong Wang
    Reviewed-by: Matan Barak
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Bodong Wang
     
  • Add struct ib_udata to the signature of create_ah callback that is
    implemented by IB device drivers. This allows HW drivers to return extra
    data to the userspace library.
    This patch prepares the ground for mlx5 driver to resolve destination
    mac address for a given GID and return it to userspace.
    This patch was previously submitted by Knut Omang as a part of the
    patch set to support Oracle's Infiniband HCA (SIF).

    Signed-off-by: Knut Omang
    Signed-off-by: Moni Shoua
    Reviewed-by: Yishai Hadas
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moni Shoua
     
  • The function ib_resolve_eth_dmac() requires struct qp_attr * and
    qp_attr_mask as parameters while the function might be useful to resolve
    dmac for address handles. This patch changes the signature of the
    function so it can be used in the flow of creating an address handle.

    Signed-off-by: Moni Shoua
    Reviewed-by: Yishai Hadas
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moni Shoua
     
  • For a tunneled packet which contains external and internal headers,
    we refer to the external headers as "outer fields" and the internal
    headers as "inner fields".

    Example of a tunneled packet:

    { L2 | L3 | L4 | tunnel header | L2 | L3 | l4 | data }
    | | | | | | |
    { outer fields }{ inner fields }

    This patch introduces a new flag for flow steering rules
    - IB_FLOW_SPEC_INNER - which specifies that the rule applies
    to the inner fields, rather than to the outer fields of the packet.

    Signed-off-by: Moses Reuben
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moses Reuben
     
  • Aligned the structure ib_flow_spec_type indentation,
    after adding a new definition.

    Signed-off-by: Moses Reuben
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moses Reuben
     
  • In order to support tunneling, that can be used by the QP,
    both struct ib_flow_spec_tunnel and struct ib_flow_tunnel_filter can be
    used to more IP or UDP based tunneling protocols (e.g NVGRE, GRE, etc).

    IB_FLOW_SPEC_VXLAN_TUNNEL type flow specification is added to use this
    functionality and match specific Vxlan packets.

    In similar to IPv6, we check overflow of the vni value by
    comparing with the maximum size.

    Signed-off-by: Moses Reuben
    Reviewed-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moses Reuben
     

12 Dec, 2016

3 commits


17 Nov, 2016

1 commit

  • When MAD arrives to the hypervisor, we need to identify which slave it
    should be sent by destination GID. When L3 protocol is IPv4 the
    GRH is replaced by an IPv4 header. This patch detects when IPv4 header
    needs to be parsed instead of GRH.

    Fixes: b6ffaeffaea4 ('mlx4: In RoCE allow guests to have multiple GIDS')
    Signed-off-by: Moni Shoua
    Signed-off-by: Daniel Jurgens
    Reviewed-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moni Shoua
     

16 Nov, 2016

2 commits

  • Save a cacheline by having hot path calldowns together.

    Reviewed-by: Sebastian Sanchez
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     
  • Profiling shows that the key validation is susceptible
    to cache line trading when accessing the lkey table.

    Fix by separating out the read mostly fields from the write
    fields. In addition the shift amount, which is function
    of the lkey table size, is precomputed and stored with the
    table pointer. Since both the shift and table pointer
    are in the same read mostly cacheline, this saves a cache
    line in this hot path.

    Reviewed-by: Sebastian Sanchez
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

10 Oct, 2016

1 commit

  • Pull main rdma updates from Doug Ledford:
    "This is the main pull request for the rdma stack this release. The
    code has been through 0day and I had it tagged for linux-next testing
    for a couple days.

    Summary:

    - updates to mlx5

    - updates to mlx4 (two conflicts, both minor and easily resolved)

    - updates to iw_cxgb4 (one conflict, not so obvious to resolve,
    proper resolution is to keep the code in cxgb4_main.c as it is in
    Linus' tree as attach_uld was refactored and moved into
    cxgb4_uld.c)

    - improvements to uAPI (moved vendor specific API elements to uAPI
    area)

    - add hns-roce driver and hns and hns-roce ACPI reset support

    - conversion of all rdma code away from deprecated
    create_singlethread_workqueue

    - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
    staging)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
    staging/lustre: Disable InfiniBand support
    iw_cxgb4: add fast-path for small REG_MR operations
    cxgb4: advertise support for FR_NSMR_TPTE_WR
    IB/core: correctly handle rdma_rw_init_mrs() failure
    IB/srp: Fix infinite loop when FMR sg[0].offset != 0
    IB/srp: Remove an unused argument
    IB/core: Improve ib_map_mr_sg() documentation
    IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
    IB/mthca: Move user vendor structures
    IB/nes: Move user vendor structures
    IB/ocrdma: Move user vendor structures
    IB/mlx4: Move user vendor structures
    IB/cxgb4: Move user vendor structures
    IB/cxgb3: Move user vendor structures
    IB/mlx5: Move and decouple user vendor structures
    IB/{core,hw}: Add constant for node_desc
    ipoib: Make ipoib_warn ratelimited
    IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
    IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
    IB/ipoib: Remove deprecated create_singlethread_workqueue
    ...

    Linus Torvalds
     

08 Oct, 2016

5 commits

  • Signed-off-by: Yuval Shaia
    Signed-off-by: Doug Ledford

    Yuval Shaia
     
  • Add the following fields to IPv6 flow filter specification:
    1. Traffic Class
    2. Flow Label
    3. Next Header
    4. Hop Limit

    Signed-off-by: Maor Gottlieb
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Maor Gottlieb
     
  • Add the following fields to IPv4 flow filter specification:
    1. Type of Service
    2. Time to Live
    3. Flags
    4. Protocol

    Signed-off-by: Maor Gottlieb
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Maor Gottlieb
     
  • Flow steering specifications structures were implemented as in an
    extensible way that allows one to add new filters and new fields
    to existing filters.
    These specifications have never been extended, therefore the
    kernel flow specifications size and the user flow specifications size
    were must to be equal.

    In downstream patch, the IPv4 flow specifications type is extended to
    support TOS and TTL fields.

    To support an extension we change the flow specifications size
    condition test to be as following:

    * If the user flow specifications is bigger than the kernel
    specifications, we verify that all the bits which not in the kernel
    specifications are zeros and the flow is added only with the kernel
    specifications fields.

    * Otherwise, we add flow rule only with the user specifications fields.

    User space filters must be aligned with 32bits.

    Signed-off-by: Maor Gottlieb
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Maor Gottlieb
     
  • Expose RSS related capabilities, it includes both direct ones (i.e.
    struct ib_rss_caps) and max_wq_type_rq which may be used in both
    RSS and non RSS flows.

    Specifically,
    supported_qpts:
    - QP types that support RSS on the device.

    max_rwq_indirection_tables:
    - Max number of receive work queue indirection tables that
    could be opened on the device.

    max_rwq_indirection_table_size:
    - Max size of a receive work queue indirection table.

    max_wq_type_rq:
    - Max number of work queues of receive type that
    could be opened on the device.

    Signed-off-by: Yishai Hadas
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Yishai Hadas
     

07 Oct, 2016

1 commit

  • This patch fixes below kernel crash on memory registration for rxe
    and other transport drivers which has dma_ops extension.

    IB/core invokes ib_map_sg_attrs() in generic manner with dma attributes
    which is used by mlx5 and mthca adapters. However in doing so it
    ignored honoring dma_ops extension of software based transports for
    sg map/unmap operation. This results in calling dma_map_sg_attrs of
    hardware virtual device resulting in crash for null reference.

    We extend the core to support sg_map/unmap_attrs and transport drivers
    to implement those dma_ops callback functions.

    Verified usign perftest applications.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] check_addr+0x35/0x60
    ...
    Call Trace:
    [] ? nommu_map_sg+0x99/0xd0
    [] ib_umem_get+0x3d6/0x470 [ib_core]
    [] rxe_mem_init_user+0x49/0x270 [rdma_rxe]
    [] ? rxe_add_index+0xca/0x100 [rdma_rxe]
    [] rxe_reg_user_mr+0x9f/0x130 [rdma_rxe]
    [] ib_uverbs_reg_mr+0x14e/0x2c0 [ib_uverbs]
    [] ib_uverbs_write+0x15b/0x3b0 [ib_uverbs]
    [] ? mem_cgroup_commit_charge+0x76/0xe0
    [] ? page_add_new_anon_rmap+0x89/0xc0
    [] ? lru_cache_add_active_or_unevictable+0x39/0xc0
    [] __vfs_write+0x28/0x120
    [] ? rw_verify_area+0x49/0xb0
    [] vfs_write+0xb2/0x1b0
    [] SyS_write+0x46/0xa0
    [] entry_SYSCALL_64_fastpath+0x1a/0xa4

    Signed-off-by: Parav Pandit
    Signed-off-by: Doug Ledford

    Parav Pandit
     

02 Oct, 2016

1 commit

  • Add IB headers, defines, and accessors that are identical
    in both qib and hfi1 into the core includes.

    The accessors for be maintenance of __be64 fields since
    alignment is potentially invalid and can differ based on
    the presense of the GRH.

    {hfi1,qib}_ib_headers will be ib_headers.
    {hfi1,qib|_other_headers will be ib_other_headers.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Don Hiatt
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford

    Mike Marciniszyn
     

24 Sep, 2016

3 commits

  • We now only use it from ib_alloc_pd to create a local DMA lkey if the
    device doesn't provide one, or a global rkey if the ULP requests it.

    This patch removes ib_get_dma_mr and open codes the functionality in
    ib_alloc_pd so that we can simplify the code and prevent abuse of the
    functionality. As a side effect we can also simplify things by removing
    the valid access bit check, and the PD refcounting.

    In the future I hope to also remove the per-PD global MR entirely by
    shifting this work into the HW drivers, as one step towards avoiding
    the struct ib_mr overload for various different use cases.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
    less unchecked, this moves the capability of creating a global rkey into
    the RDMA core, where it can be easily audited. It also prints a warning
    everytime this feature is used as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     
  • This has two reasons: a) to clearly mark that drivers don't have any
    business using it, and b) because we're going to use it for the
    (dangerous) global rkey soon, so that drivers don't create on themselves.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Steve Wise
    Signed-off-by: Doug Ledford

    Christoph Hellwig
     

17 Sep, 2016

1 commit


24 Aug, 2016

1 commit


05 Aug, 2016

2 commits

  • Pull second round of rdma updates from Doug Ledford:
    "This can be split out into just two categories:

    - fixes to the RDMA R/W API in regards to SG list length limits
    (about 5 patches)

    - fixes/features for the Intel hfi1 driver (everything else)

    The hfi1 driver is still being brought to full feature support by
    Intel, and they have a lot of people working on it, so that amounts to
    almost the entirety of this pull request"

    * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
    IB/hfi1: Add cache evict LRU list
    IB/hfi1: Fix memory leak during unexpected shutdown
    IB/hfi1: Remove unneeded mm argument in remove function
    IB/hfi1: Consistently call ops->remove outside spinlock
    IB/hfi1: Use evict mmu rb operation
    IB/hfi1: Add evict operation to the mmu rb handler
    IB/hfi1: Fix TID caching actions
    IB/hfi1: Make the cache handler own its rb tree root
    IB/hfi1: Make use of mm consistent
    IB/hfi1: Fix user SDMA racy user request claim
    IB/hfi1: Fix error condition that needs to clean up
    IB/hfi1: Release node on insert failure
    IB/hfi1: Validate SDMA user iovector count
    IB/hfi1: Validate SDMA user request index
    IB/hfi1: Use the same capability state for all shared contexts
    IB/hfi1: Prevent null pointer dereference
    IB/hfi1: Rename TID mmu_rb_* functions
    IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
    IB/hfi1: Restructure hfi1_file_open
    IB/hfi1: Make iovec loop index easy to understand
    ...

    Linus Torvalds
     
  • Pull base rdma updates from Doug Ledford:
    "Round one of 4.8 code: while this is mostly normal, there is a new
    driver in here (the driver was hosted outside the kernel for several
    years and is actually a fairly mature and well coded driver). It
    amounts to 13,000 of the 16,000 lines of added code in here.

    Summary:

    - Updates/fixes for iw_cxgb4 driver
    - Updates/fixes for mlx5 driver
    - Add flow steering and RSS API
    - Add hardware stats to mlx4 and mlx5 drivers
    - Add firmware version API for RDMA driver use
    - Add the rxe driver (this is a software RoCE driver that makes any
    Ethernet device a RoCE device)
    - Fixes for i40iw driver
    - Support for send only multicast joins in the cma layer
    - Other minor fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
    Soft RoCE driver
    IB/core: Support for CMA multicast join flags
    IB/sa: Add cached attribute containing SM information to SA port
    IB/uverbs: Fix race between uverbs_close and remove_one
    IB/mthca: Clean up error unwind flow in mthca_reset()
    IB/mthca: NULL arg to pci_dev_put is OK
    IB/hfi1: NULL arg to sc_return_credits is OK
    IB/mlx4: Add diagnostic hardware counters
    net/mlx4: Query performance and diagnostics counters
    net/mlx4: Add diagnostic counters capability bit
    Use smaller 512 byte messages for portmapper messages
    IB/ipoib: Report SG feature regardless of HW UD CSUM capability
    IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
    IB/hfi1: Disable by default
    IB/rdmavt: Disable by default
    IB/mlx5: Fix port counter ID association to QP offset
    IB/mlx5: Fix iteration overrun in GSI qps
    i40iw: Add NULL check for puda buffer
    i40iw: Change dup_ack_thresh to u8
    i40iw: Remove unnecessary check for moving CQ head
    ...

    Linus Torvalds
     

04 Aug, 2016

4 commits

  • Doug Ledford
     
  • The dma-mapping core and the implementations do not change the DMA
    attributes passed by pointer. Thus the pointer can point to const data.
    However the attributes do not have to be a bitfield. Instead unsigned
    long will do fine:

    1. This is just simpler. Both in terms of reading the code and setting
    attributes. Instead of initializing local attributes on the stack
    and passing pointer to it to dma_set_attr(), just set the bits.

    2. It brings safeness and checking for const correctness because the
    attributes are passed by value.

    Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Vineet Gupta
    Acked-by: Robin Murphy
    Acked-by: Hans-Christian Noren Egtvedt
    Acked-by: Mark Salter [c6x]
    Acked-by: Jesper Nilsson [cris]
    Acked-by: Daniel Vetter [drm]
    Reviewed-by: Bart Van Assche
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Fabien Dessenne [bdisp]
    Reviewed-by: Marek Szyprowski [vb2-core]
    Acked-by: David Vrabel [xen]
    Acked-by: Konrad Rzeszutek Wilk [xen swiotlb]
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Richard Kuo [hexagon]
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Gerald Schaefer [s390]
    Acked-by: Bjorn Andersson
    Acked-by: Hans-Christian Noren Egtvedt [avr32]
    Acked-by: Vineet Gupta [arc]
    Acked-by: Robin Murphy [arm64 and dma-iommu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     
  • Doug Ledford
     
  • Added UCMA and CMA support for multicast join flags. Flags are
    passed using UCMA CM join command previously reserved fields.
    Currently supporting two join flags indicating two different
    multicast JoinStates:

    1. Full Member:
    The initiator creates the Multicast group(MCG) if it wasn't
    previously created, can send Multicast messages to the group
    and receive messages from the MCG.

    2. Send Only Full Member:
    The initiator creates the Multicast group(MCG) if it wasn't
    previously created, can send Multicast messages to the group
    but doesn't receive any messages from the MCG.

    IB: Send Only Full Member requires a query of ClassPortInfo
    to determine if SM/SA supports this option. If SM/SA
    doesn't support Send-Only there will be no join request
    sent and an error will be returned.

    ETH: When Send Only Full Member is requested no IGMP join
    will be sent.

    Signed-off-by: Alex Vesker
    Reviewed by: Hal Rosenstock
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Alex Vesker
     

03 Aug, 2016

1 commit

  • The use of the specific opcode test is redundant since
    all ack entry users correctly manipulate the mr pointer
    to selectively trigger the reference clearing.

    The overly specific test hinders the use of implementation
    specific operations.

    The change needs to get rid of the union to insure that
    an atomic value is not seen as an MR pointer.

    Reviewed-by: Ashutosh Dixit
    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Ira Weiny
    Signed-off-by: Doug Ledford

    Ira Weiny