02 Apr, 2020

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "The majority of the patches are cleanups, refactorings and clarity
    improvements.

    This cycle saw some more activity from Syzkaller, I think we are now
    clean on all but one of those bugs, including the long standing and
    obnoxious rdma_cm locking design defect. Continue to see many drivers
    getting cleanups, with a few new user visible features.

    Summary:

    - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1

    - Lots of cleanup patches for hns

    - Convert more places to use refcount

    - Aggressively lock the RDMA CM code that syzkaller says isn't
    working

    - Work to clarify ib_cm

    - Use the new ib_device lifecycle model in bnxt_re

    - Fix mlx5's MR cache which seems to be failing more often with the
    new ODP code

    - mlx5 'dynamic uar' and 'tx steering' user interfaces"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
    RDMA/bnxt_re: make bnxt_re_ib_init static
    IB/qib: Delete struct qib_ivdev.qp_rnd
    RDMA/hns: Fix uninitialized variable bug
    RDMA/hns: Modify the mask of QP number for CQE of hip08
    RDMA/hns: Reduce the maximum number of extend SGE per WQE
    RDMA/hns: Reduce PFC frames in congestion scenarios
    RDMA/mlx5: Add support for RDMA TX flow table
    net/mlx5: Add support for RDMA TX steering
    IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
    IB/hfi1: Fix memory leaks in sysfs registration and unregistration
    IB/mlx5: Move to fully dynamic UAR mode once user space supports it
    IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
    IB/mlx5: Extend QP creation to get uar page index from user space
    IB/mlx5: Extend CQ creation to get uar page index from user space
    IB/mlx5: Expose UAR object and its alloc/destroy commands
    IB/hfi1: Get rid of a warning
    RDMA/hns: Remove redundant judgment of qp_type
    RDMA/hns: Remove redundant assignment of wc->smac when polling cq
    RDMA/hns: Remove redundant qpc setup operations
    RDMA/hns: Remove meaningless prints
    ...

    Linus Torvalds
     

30 Mar, 2020

1 commit


27 Mar, 2020

1 commit

  • struct mlx5_bfreg_info is used by mlx5_ib only but is exposed to both RDMA
    and netdev parts of mlx5 driver. Move that struct to mlx5_ib namespace,
    clean vertical space alignment and convert lib_uar_4k from bool to
    bitfield.

    Link: https://lore.kernel.org/r/20200324060143.1569116-5-leon@kernel.org
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Leon Romanovsky
     

13 Mar, 2020

2 commits


10 Mar, 2020

1 commit

  • This series adds some HW bits and definitions for mlx5 driver, to be
    used by downstream features in both rdma and netdev branches.

    * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
    net/mlx5: HW bit for goto chain offload support
    net/mlx5: Expose link speed directly
    net/mlx5: Introduce TLS and IPSec objects enums
    net/mlx5: Introduce egress acl forward-to-vport capability
    net/mlx5: Expose raw packet pacing APIs
    net/mlx5e: Replace zero-length array with flexible-array member
    net/mlx5: fix spelling mistake "reserverd" -> "reserved"

    Signed-off-by: Saeed Mahameed

    Saeed Mahameed
     

05 Mar, 2020

1 commit

  • Expose raw packet pacing APIs to be used by DEVX based applications.
    The existing code was refactored to have a single flow with the new raw
    APIs.

    The new raw APIs considered the input of 'pp_rate_limit_context', uid,
    'dedicated', upon looking for an existing entry.

    This raw mode enables future device specification data in the raw
    context without changing the existing logic and code.

    The ability to ask for a dedicated entry gives control for application
    to allocate entries according to its needs.

    A dedicated entry may not be used by some other process and it also
    enables the process spreading its resources to some different entries
    for use different hardware resources as part of enforcing the rate.

    The counter per entry was changed to be u64 to prevent any option to
    overflow.

    Signed-off-by: Yishai Hadas
    Acked-by: Saeed Mahameed
    Signed-off-by: Leon Romanovsky

    Yishai Hadas
     

19 Feb, 2020

1 commit

  • On driver load:
    - Initialize resource dump data structure and memory access tools (mkey
    & pd).
    - Read the resource dump's menu which contains the FW segment
    identifier. Each record is identified by the segment name (ASCII).

    During the driver's course of life, users (like reporters) may request
    dumps per segment. The user should create a command providing the
    segment identifier (SW enumeration) and command keys. In return, the
    user receives a command context. In order to receive the dump, the user
    should supply the command context and a memory (aligned to a PAGE) on
    which the dump content will be written. Since the dump may be larger
    than the given memory, the user may resubmit the command until received
    an indication of end-of-dump. It is the user's responsibility to destroy
    the command.

    Signed-off-by: Aya Levin
    Reviewed-by: Moshe Shemesh
    Acked-by: Jiri Pirko
    Signed-off-by: Saeed Mahameed

    Aya Levin
     

01 Feb, 2020

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "A very quiet cycle with few notable changes. Mostly the usual list of
    one or two patches to drivers changing something that isn't quite rc
    worthy. The subsystem seems to be seeing a larger number of rework and
    cleanup style patches right now, I feel that several vendors are
    prepping their drivers for new silicon.

    Summary:

    - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
    rxe, i40iw

    - Larger series doing cleanup and rework for hns and hfi1.

    - Some general reworking of the CM code to make it a little more
    understandable

    - Unify the different code paths connected to the uverbs FD scheme

    - New UAPI ioctls conversions for get context and get async fd

    - Trace points for CQ and CM portions of the RDMA stack

    - mlx5 driver support for virtio-net formatted rings as RDMA raw
    ethernet QPs

    - verbs support for setting the PCI-E relaxed ordering bit on DMA
    traffic connected to a MR

    - A couple of bug fixes that came too late to make rc7"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
    RDMA/core: Make the entire API tree static
    RDMA/efa: Mask access flags with the correct optional range
    RDMA/cma: Fix unbalanced cm_id reference count during address resolve
    RDMA/umem: Fix ib_umem_find_best_pgsz()
    IB/mlx4: Fix leak in id_map_find_del
    IB/opa_vnic: Spelling correction of 'erorr' to 'error'
    IB/hfi1: Fix logical condition in msix_request_irq
    RDMA/cm: Remove CM message structs
    RDMA/cm: Use IBA functions for complex structure members
    RDMA/cm: Use IBA functions for simple structure members
    RDMA/cm: Use IBA functions for swapping get/set acessors
    RDMA/cm: Use IBA functions for simple get/set acessors
    RDMA/cm: Add SET/GET implementations to hide IBA wire format
    RDMA/cm: Add accessors for CM_REQ transport_type
    IB/mlx5: Return the administrative GUID if exists
    RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
    IB/mlx4: Fix memory leak in add_gid error flow
    IB/mlx5: Expose RoCE accelerator counters
    RDMA/mlx5: Set relaxed ordering when requested
    RDMA/core: Add the core support field to METHOD_GET_CONTEXT
    ...

    Linus Torvalds
     

26 Jan, 2020

1 commit

  • A user can change the operational GUID (a.k.a affective GUID) through
    link/infiniband. Therefore it is preferred to return the currently set
    GUID if it exists instead of the operational.

    This way the PF can query which VF GUID will be set in the next bind. In
    order to align with MAC address, zero is returned if administrative GUID
    is not set.

    For example, before setting administrative GUID:
    $ ip link show
    ib0: mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
    link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off

    Then:

    $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
    $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00

    After setting administrative GUID:
    $ ip link show
    ib0: mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
    link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off

    Fixes: 9c0015ef0928 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
    Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org
    Signed-off-by: Danit Goldberg
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Danit Goldberg
     

17 Jan, 2020

4 commits

  • This merge syncs with mlx5-next latest HW bits and layout updates for next
    features, in addition one patch that improves
    mlx5_create_auto_grouped_flow_table() API across all mlx5 users.

    * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
    net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
    net/mlx5e: Add discard counters per priority
    net/mlx5e: Expose FEC feilds and related capability bit
    net/mlx5: Add mlx5_ifc definitions for connection tracking support
    net/mlx5: Add copy header action struct layout
    net/mlx5: Expose resource dump register mapping
    net/mlx5: Add structures and defines for MIRC register
    net/mlx5: Read MCAM register groups 1 and 2
    net/mlx5: Add structures layout for new MCAM access reg groups
    net/mlx5: Expose vDPA emulation device capabilities
    net/mlx5: Add Virtio Emulation related device capabilities

    Signed-off-by: Saeed Mahameed

    Saeed Mahameed
     
  • Add new register enumeration for resource dump. Add layout mapping for
    resource dump: access command and response.

    Signed-off-by: Aya Levin
    Reviewed-by: Moshe Shemesh
    Signed-off-by: Saeed Mahameed

    Aya Levin
     
  • Add needed structures, layouts and defines for MIRC (Management Image
    Re-activation Control) register. This structure will be used for the FSM
    reactivation flow in the downstream patches.

    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed

    Eran Ben Elisha
     
  • On load, Driver caches MCAM (Management Capabilities Mask Register)
    registers. in addition to the only MCAM register group (0) the driver
    already reads, here we add support for reading groups 1 and 2.

    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed

    Eran Ben Elisha
     

08 Jan, 2020

1 commit


25 Nov, 2019

1 commit

  • Danit Goldberg says:

    ====================
    This series extends RTNETLINK to provide IB port and node GUIDs, which
    were configured for Infiniband VFs.

    The functionality to set VF GUIDs already existed for a long time, and
    here we are adding the missing "get" so that netlink will be symmetric and
    various cloud orchestration tools will be able to manage such VFs more
    naturally.

    The iproute2 was extended too to present those GUIDs.

    - ip link show

    For example:
    - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
    - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
    - ip link show ib4
    ib4: mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
    ====================

    Based on the mlx5-next branch from
    git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
    dependencies

    * branch 'ib-guids': (35 commits)
    IB/mlx5: Implement callbacks for getting VFs GUID attributes
    IB/ipoib: Add ndo operation for getting VFs GUID attributes
    IB/core: Add interfaces to get VF node and port GUIDs
    net/core: Add support for getting VF GUIDs

    net/mlx5: Add new chain for netfilter flow table offload
    net/mlx5: Refactor creating fast path prio chains
    net/mlx5: Accumulate levels for chains prio namespaces
    net/mlx5: Define fdb tc levels per prio
    net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
    net/mlx5: Simplify fdb chain and prio eswitch defines
    IB/mlx5: Load profile according to RoCE enablement state
    IB/mlx5: Rename profile and init methods
    net/mlx5: Handle "enable_roce" devlink param
    net/mlx5: Document flow_steering_mode devlink param
    devlink: Add new "enable_roce" generic device param
    net/mlx5: fix spelling mistake "metdata" -> "metadata"
    net/mlx5: fix kvfree of uninitialized pointer spec
    IB/mlx5: Introduce and use mlx5_core_is_vf()
    net/mlx5: E-switch, Enable metadata on own vport
    net/mlx5: Refactor ingress acl configuration
    ...

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

12 Nov, 2019

1 commit

  • Register "enable_roce" param, default value is RoCE enabled.
    Current configuration is stored on mlx5_core_dev and exposed to user
    through the cmode runtime devlink param.
    Changing configuration requires changing the cmode driverinit devlink
    param and calling devlink reload.

    Signed-off-by: Michael Guralnik
    Acked-by: Jiri Pirko
    Signed-off-by: Saeed Mahameed

    Michael Guralnik
     

02 Nov, 2019

1 commit

  • Instead of deciding a given device is virtual function or
    not based on a device is PF or not, use already defined
    MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().

    This enables to clearly identify PF, VF and non virtual functions.

    Signed-off-by: Parav Pandit
    Reviewed-by: Vu Pham
    Signed-off-by: Saeed Mahameed

    Parav Pandit
     

29 Oct, 2019

1 commit


02 Sep, 2019

2 commits

  • Merge mlx5-next patches needed for upcoming mlx5 software steering.

    1) Alex adds HW bits and definitions required for SW steering
    2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
    3) Maor, Cleanups and fixups for eswitch mode and RoCE
    4) Mark, Set only stag for match untagged packets

    Signed-off-by: Saeed Mahameed

    Saeed Mahameed
     
  • Move the device memory allocation and deallocation commands
    SW ICM memory to mlx5_core to expose this API for all
    mlx5_core users.

    This comes as preparation for supporting SW steering in kernel
    where it will be required to allocate and register device
    memory for direct rule insertion.

    In addition, an API to register this device memory for future
    remote access operations is introduced using the create_mkey
    commands.

    Signed-off-by: Ariel Levkovich
    Reviewed-by: Mark Bloch
    Signed-off-by: Saeed Mahameed

    Ariel Levkovich
     

29 Aug, 2019

1 commit


22 Aug, 2019

1 commit

  • HV VHCA is a layer which provides PF to VF communication channel based on
    HyperV PCI config channel. It implements Mellanox's Inter VHCA control
    communication protocol. The protocol contains control block in order to
    pass messages between the PF and VF drivers, and data blocks in order to
    pass actual data.

    The infrastructure is agent based. Each agent will be responsible of
    contiguous buffer blocks in the VHCA config space. This infrastructure will
    bind agents to their blocks, and those agents can only access read/write
    the buffer blocks assigned to them. Each agent will provide three
    callbacks (control, invalidate, cleanup). Control will be invoked when
    block-0 is invalidated with a command that concerns this agent. Invalidate
    callback will be invoked if one of the blocks assigned to this agent was
    invalidated. Cleanup will be invoked before the agent is being freed in
    order to clean all of its open resources or deferred works.

    Block-0 serves as the control block. All execution commands from the PF
    will be written by the PF over this block. VF will ack on those by
    writing on block-0 as well. Its format is described by struct
    mlx5_hv_vhca_control_block layout.

    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     

11 Aug, 2019

1 commit

  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    This cleans up a lot of unneeded code and logic around the debugfs
    files, making all of this much simpler and easier to understand as we
    don't need to keep the dentries saved anymore.

    Cc: Saeed Mahameed
    Cc: Leon Romanovsky
    Cc: netdev@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Greg Kroah-Hartman
     

08 Aug, 2019

1 commit

  • Reference counters are preferred to use refcount_t instead of
    atomic_t.
    This is because the implementation of refcount_t can prevent
    overflows and detect possible use-after-free.
    So convert atomic_t ref counters to refcount_t.

    Signed-off-by: Chuhong Yuan
    Acked-by: Leon Romanovsky
    Signed-off-by: Saeed Mahameed

    Chuhong Yuan
     

02 Aug, 2019

2 commits

  • Add a pool of flow counters, based on flow counter bulks, removing the
    need to allocate a new counter via a costly FW command during the flow
    creation process. The time it takes to acquire/release a flow counter
    is cut from ~50 [us] to ~50 [ns].

    The pool is part of the mlx5 driver instance, and provides flow
    counters for aging flows. mlx5_fc_create() was modified to provide
    counters for aging flows from the pool by default, and
    mlx5_destroy_fc() was modified to release counters back to the pool
    for later reuse. If bulk allocation is not supported or fails, and for
    non-aging flows, the fallback behavior is to allocate and free
    individual counters.

    The pool is comprised of three lists of flow counter bulks, one of
    fully used bulks, one of partially used bulks, and one of unused
    bulks. Counters are provided from the partially used bulks first, to
    help limit bulk fragmentation.

    The pool maintains a threshold, and strives to maintain the amount of
    available counters below it. The pool is increased in size when a
    counter acquisition request is made and there are no available
    counters, and it is decreased in size when the last counter in a bulk
    is released and there are more available counters than the threshold.
    All pool size changes are done in the context of the
    acquiring/releasing process.

    The value of the threshold is directly correlated to the amount of
    used counters the pool is providing, while constrained by a hard
    maximum, and is recalculated every time a bulk is allocated/freed.
    This ensures that the pool only consumes large amounts of memory for
    available counters if the pool is being used heavily. When fully
    populated and at the hard maximum, the buffer of available counters
    consumes ~40 [MB].

    Signed-off-by: Gavi Teitz
    Reviewed-by: Vlad Buslov
    Signed-off-by: Saeed Mahameed

    Gavi Teitz
     
  • Towards introducing the ability to allocate bulks of flow counters,
    refactor the flow counter bulk query process, removing functions and
    structs whose names indicated being used for flow counter bulk
    allocation FW commands, despite them actually only being used to
    support bulk querying, and migrate their functionality to correctly
    named functions in their natural location, fs_counters.c.

    Additionally, optimize the bulk query process by:
    * Extracting the memory used for the query to mlx5_fc_stats so
    that it is only allocated once, and not for each bulk query.
    * Querying all the counters in one function call.

    Signed-off-by: Gavi Teitz
    Reviewed-by: Vlad Buslov
    Signed-off-by: Saeed Mahameed

    Gavi Teitz
     

05 Jul, 2019

1 commit

  • Misc updates from mlx5-next branch:

    1) Add the required HW definitions and structures for upcoming TLS
    support.
    2) Add support for MCQI and MCQS hardware registers for fw version query.
    3) Added hardware bits and structures definitions for sub-functions
    4) Small code cleanup and improvement for PF pci driver.
    5) Bluefield (ECPF) updates and refactoring for better E-Switch
    management on ECPF embedded CPU NIC:
    5.1) Consolidate querying eswitch number of VFs
    5.2) Register event handler at the correct E-Switch init stage
    5.3) Setup PF's inline mode and vlan pop when the ECPF is the
    E-Swtich manager ( the host PF is basically a VF ).
    5.4) Handle Vport UC address changes in switchdev mode.

    6) Cleanup the rep and netdev reference when unloading IB rep.

    Signed-off-by: Saeed Mahameed

    i# All conflicts fixed but you are still merging.

    Saeed Mahameed
     

04 Jul, 2019

2 commits

  • Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
    mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
    vports as well.
    Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
    more generic vport.h header file. Such exposure is not desired.
    Hence a mlx5_eswitch_get_total_vports() is introduced.

    Given that mlx5_eswitch_get_total_vports() API wants to work on const
    mlx5_core_dev*, change its helper functions also to accept const *dev.

    Signed-off-by: Parav Pandit
    Signed-off-by: Saeed Mahameed

    Parav Pandit
     
  • Expose the API to register for ANY event, mlx5_ib will be able to use
    this functionality for its needs.

    Signed-off-by: Yishai Hadas
    Acked-by: Saeed Mahameed
    Signed-off-by: Leon Romanovsky

    Yishai Hadas
     

02 Jul, 2019

3 commits

  • While enabling SR-IOV, PCI core already checks that if SR-IOV is already
    enabled, it returns failure error code.
    Hence, remove such duplicate check from mlx5_core driver.

    While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in
    reverse order of mlx5_device_enable_sriov().

    Signed-off-by: Parav Pandit
    Signed-off-by: Saeed Mahameed

    Parav Pandit
     
  • Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5
    device types.

    mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep
    mlx5_coredev_type in mlx5_core_dev structure.

    Signed-off-by: Huy Nguyen
    Signed-off-by: Vu Pham
    Signed-off-by: Parav Pandit
    Reviewed-by: Parav Pandit
    Signed-off-by: Saeed Mahameed

    Huy Nguyen
     
  • Given a fw component index, the MCQI register allows us to query
    this component's information (e.g. its version and capabilities).

    Given a fw component index, the MCQS register allows us to query the
    status of a fw component, including its type and state
    (e.g. PRESET/IN_USE).
    It can be used to find the index of a component of a specific type, by
    sequentially increasing the component index, and querying each time the
    type of the returned component.
    If max component index is reached, 'last_index_flag' is set by the HCA.

    These registers' description was added to query the running and pending
    fw version of the HCA.

    Signed-off-by: Shay Agroskin
    Signed-off-by: Saeed Mahameed

    Shay Agroskin
     

29 Jun, 2019

1 commit


25 Jun, 2019

1 commit

  • The lock protecting the data structure does not need to be an rwlock. The
    only read access to the lock is in an error path, and if that's limiting
    your scalability, you have bigger performance problems.

    Eliminate mlx5_mkey_table in favour of using the xarray directly.
    reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
    be called in interrupt context.

    This also fixes a minor bug where SRCU locking was being used on the radix
    tree read side, when RCU was needed too.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Saeed Mahameed

    Matthew Wilcox
     

14 Jun, 2019

5 commits

  • Report devlink health on FW fatal issues via fw_fatal_reporter. The
    driver recover flow for FW fatal error is now being handled by the
    devlink health.

    Having the recovery controlled by devlink health, the user has the
    ability to cancel the auto-recovery for debug session and run it
    manually.

    Call mlx5_enter_error_state() before calling devlink_health_report() to
    ensure entering device error state even if auto-recovery is off.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Saeed Mahameed

    Moshe Shemesh
     
  • Create mlx5_devlink_health_reporter for fw fatal reporter.
    The fw fatal reporter is added in addition to the fw reporter and
    implements the recover callback.
    The point of having two reporters for FW issues, is that we
    don't want to run FW recover on any issue, but only fatal ones.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed

    Moshe Shemesh
     
  • Use devlink_health_report() to report any symptom of FW issue as FW
    counter miss or new health syndrome.
    The FW issues detected in mlx5 during poll_health which is called in
    timer atomic context and so health work queue is used to schedule the
    reports.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed

    Moshe Shemesh
     
  • Create mlx5_devlink_health_reporter for FW reporter. The FW reporter
    implements devlink_health_reporter diagnose callback.

    The fw reporter diagnose command can be triggered any time by the user
    to check current fw status.
    In healthy status, it will return clear syndrome. Otherwise it will
    return the syndrome and description of the error type.

    Command example and output on healthy status:
    $ devlink health diagnose pci/0000:82:00.0 reporter fw
    Syndrome: 0

    Command example and output on non healthy status:
    $ devlink health diagnose pci/0000:82:00.0 reporter fw
    Syndrome: 8 Description: unrecoverable hardware error

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Saeed Mahameed

    Moshe Shemesh
     
  • If a FW assert is considered fatal, indicated by a new bit in the health
    buffer, reset the FW. After the reset go through the normal recovery
    flow. Only one PF needs to issue the reset, so an attempt is made to
    prevent the 2nd function from also issuing the reset.
    It's not an error if that happens, it just slows recovery.

    Signed-off-by: Feras Daoud
    Signed-off-by: Alex Vesker
    Signed-off-by: Moshe Shemesh
    Signed-off-by: Daniel Jurgens
    Signed-off-by: Saeed Mahameed

    Feras Daoud