07 Sep, 2017

1 commit

  • Pull networking updates from David Miller:

    1) Support ipv6 checksum offload in sunvnet driver, from Shannon
    Nelson.

    2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
    Dumazet.

    3) Allow generic XDP to work on virtual devices, from John Fastabend.

    4) Add bpf device maps and XDP_REDIRECT, which can be used to build
    arbitrary switching frameworks using XDP. From John Fastabend.

    5) Remove UFO offloads from the tree, gave us little other than bugs.

    6) Remove the IPSEC flow cache, from Florian Westphal.

    7) Support ipv6 route offload in mlxsw driver.

    8) Support VF representors in bnxt_en, from Sathya Perla.

    9) Add support for forward error correction modes to ethtool, from
    Vidya Sagar Ravipati.

    10) Add time filter for packet scheduler action dumping, from Jamal Hadi
    Salim.

    11) Extend the zerocopy sendmsg() used by virtio and tap to regular
    sockets via MSG_ZEROCOPY. From Willem de Bruijn.

    12) Significantly rework value tracking in the BPF verifier, from Edward
    Cree.

    13) Add new jump instructions to eBPF, from Daniel Borkmann.

    14) Rework rtnetlink plumbing so that operations can be run without
    taking the RTNL semaphore. From Florian Westphal.

    15) Support XDP in tap driver, from Jason Wang.

    16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

    17) Add Huawei hinic ethernet driver.

    18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
    Delalande.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
    i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
    i40e: avoid NVM acquire deadlock during NVM update
    drivers: net: xgene: Remove return statement from void function
    drivers: net: xgene: Configure tx/rx delay for ACPI
    drivers: net: xgene: Read tx/rx delay for ACPI
    rocker: fix kcalloc parameter order
    rds: Fix non-atomic operation on shared flag variable
    net: sched: don't use GFP_KERNEL under spin lock
    vhost_net: correctly check tx avail during rx busy polling
    net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
    rxrpc: Make service connection lookup always check for retry
    net: stmmac: Delete dead code for MDIO registration
    gianfar: Fix Tx flow control deactivation
    cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
    cxgb4: Fix pause frame count in t4_get_port_stats
    cxgb4: fix memory leak
    tun: rename generic_xdp to skb_xdp
    tun: reserve extra headroom only when XDP is set
    net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
    net: dsa: bcm_sf2: Advertise number of egress queues
    ...

    Linus Torvalds
     

04 Sep, 2017

1 commit

  • Pull rdma updates from Doug Ledford:
    "This is a big pull request.

    Of note is that I'm sending you the new ioctl API for the rdma
    subsystem. We put it up on linux-api@, but didn't get much response.
    The API is complex, but it solves two different problems in one go:

    1) The bi-directional nature of the RDMA file write calls, which
    created the security hole we had to handle (and for which the fix
    is now causing problems for systems in production, we were a bit
    over zealous in the fix and the ability to open a device, then
    fork, then create new queue pairs on the device and use them is
    broken).

    2) The bloat caused by different vendors implementing extensions to
    the base verbs API. Each vendor's hardware is slightly different,
    and the hardware might be suitable for one extension but not
    another.

    By the time we add generic extensions for all the different ways
    that the different hardware can offload things, the API becomes
    bloated. Things like our completion structs have started to exceed
    a cache line in size because of all the elements needed to support
    this. That in turn shows up heavily in the performance graphs with
    a noticable drop in performance on 100Gigabit links as our
    completion structs go from occupying one cache line to 1+.

    This API makes things like the completion structs modular in a
    very similar way to netlink so that your structs can only include
    the items needed for the offloads/features you are actually using
    on a given queue pair. In that way we support everything, but only
    use what we need, and our structs stay smaller.

    The ioctl API is better explained by the posting on linux-api@ than I
    can explain it here, so I'll just leave it at that.

    The rest of the pull request is typical stuff.

    Updates for 4.14 kernel merge window

    - Lots of hfi1 driver updates (mixed with a few qib and core updates
    as well)

    - rxe updates

    - various mlx updates

    - Set default roce type to RoCEv2

    - Several larger fixes for bnxt_re that were too big for -rc

    - Several larger fixes for qedr that, likewise, were too big for -rc

    - Misc core changes

    - Make the hns_roce driver compilable on arches other than aarch64 so
    we can more easily debug build issues related to it

    - Add rdma-netlink infrastructure updates

    - Add automatic IRQ affinity infrastructure

    - Add 32bit lid support

    - Lots of misc fixes across the subsystem from random people

    - Autoloading of RDMA netlink modules

    - PCI pool cleanups from Romain Perier

    - mlx5 driver feature additions and fixes

    - Hardware tag matchine feature

    - Fix sleeping in atomic when resolving roce ah

    - Add experimental ioctl interface as posted to linux-api@"

    * tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
    IB/core: Expose ioctl interface through experimental Kconfig
    IB/core: Assign root to all drivers
    IB/core: Add completion queue (cq) object actions
    IB/core: Add legacy driver's user-data
    IB/core: Export ioctl enum types to user-space
    IB/core: Explicitly destroy an object while keeping uobject
    IB/core: Add macros for declaring methods and attributes
    IB/core: Add uverbs merge trees functionality
    IB/core: Add DEVICE object and root tree structure
    IB/core: Declare an object instead of declaring only type attributes
    IB/core: Add new ioctl interface
    RDMA/vmw_pvrdma: Fix a signedness
    RDMA/vmw_pvrdma: Report network header type in WC
    IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
    IB/cm: Fix sleeping in atomic when RoCE is used
    IB/core: Add support to finalize objects in one transaction
    IB/core: Add a generic way to execute an operation on a uobject
    Documentation: Hardware tag matching
    IB/mlx5: Support IB_SRQT_TM
    net/mlx5: Add XRQ support
    ...

    Linus Torvalds
     

30 Aug, 2017

2 commits

  • Adding support for updating the FW on new port mac, when port mac change
    is requested by the user. This info is required by the FW as OEM
    management tools require this info directly from the NIC FW.
    Check device capability bit to verify the FW supports user mac.
    If the FW does support it, use set_port command to notify the FW on the
    new mac.
    The feature is relevant only to PF port mac.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Moshe Shemesh
     
  • In order to avoid temporary large structs on the stack,
    allocate them dynamically.

    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Tal Alon
    Signed-off-by: Tariq Toukan
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     

03 Aug, 2017

1 commit

  • Currently when WoL is supported but disabled, ethtool reports:
    "Supports Wake-on: d".
    Fix the indication of Wol support, so that the indication
    remains "g" all the time if the NIC supports WoL.

    Tested:
    As accepted, when NIC supports WoL- ethtool reports:
    Supports Wake-on: g
    Wake-on: d
    when NIC doesn't support WoL- ethtool reports:
    Supports Wake-on: d
    Wake-on: d

    Fixes: 14c07b1358ed ("mlx4: Wake on LAN support")
    Signed-off-by: Inbar Karmy
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Inbar Karmy
     

24 Jul, 2017

1 commit

  • Adding visibility of resource usage of QPs, CQs and counters used by
    virtual functions. This feature will be used to give the PF administrator
    more data while debugging VF status. Usage info was added to ALLOC_RES
    command, to notify the PF if the resource which is being reserved or
    allocated for the VF will be used by kernel driver or by user verbs.

    Updated reservation and allocation functions of QP, CQ and counter with
    additional usage parameter.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Moshe Shemesh
     

18 Jul, 2017

1 commit

  • The caller to the driver marks GFP_NOIO allocations with help
    of memalloc_noio-* calls now. This makes redundant to pass down
    to the driver gfp flags, which can be GFP_KERNEL only.

    The patch removes the gfp flags argument and updates all driver paths.

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

05 Jun, 2017

1 commit

  • Our previous patch (cited below) introduced a regression
    for RAW Eth QPs.

    Fix it by checking if the QP number provided by user-space
    exists, hence allowing steering rules to be added for valid
    QPs only.

    Fixes: 89c557687a32 ("net/mlx4_en: Avoid adding steering rules with invalid ring")
    Reported-by: Or Gerlitz
    Signed-off-by: Talat Batheesh
    Signed-off-by: Tariq Toukan
    Acked-by: Or Gerlitz
    Reviewed-by: Leon Romanovsky
    Signed-off-by: David S. Miller

    Talat Batheesh
     

09 May, 2017

1 commit


22 Apr, 2017

1 commit

  • On some environments, such as certain SR-IOV VF configurations, RoCE
    isn't supported for mlx4 Ethernet ports. Currently the driver will
    not open IB device on that port.

    This is problematic since we do want user-space RAW Ethernet QPs functionality
    to remain in place. For that end, enhance the relevant driver flows such that we
    do create a device instance in that case.

    Signed-off-by: Majd Dibbiny
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Majd Dibbiny
     

17 Mar, 2017

1 commit

  • Some Hypervisors detach VFs from VMs by instantly causing an FLR event
    to be generated for a VF.

    In the mlx4 case, this will cause that VF's comm channel to be disabled
    before the VM has an opportunity to invoke the VF device's "shutdown"
    method.

    For such Hypervisors, there is a race condition between the VF's
    shutdown method and its internal-error detection/reset thread.

    The internal-error detection/reset thread (which runs every 5 seconds) also
    detects a disabled comm channel. If the internal-error detection/reset
    flow wins the race, we still get delays (while that flow tries repeatedly
    to detect comm-channel recovery).

    The cited commit fixed the command timeout problem when the
    internal-error detection/reset flow loses the race.

    This commit avoids the unneeded delays when the internal-error
    detection/reset flow wins.

    Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown")
    Signed-off-by: Jack Morgenstein
    Reported-by: Simon Xiao
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Jack Morgenstein
     

02 Mar, 2017

1 commit

  • Bitwise & was obviously intended here.

    Fixes: 745d8ae4622c ("net/mlx4: Spoofcheck and zero MAC can't coexist")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Dan Carpenter
     

23 Feb, 2017

1 commit


31 Jan, 2017

2 commits


30 Dec, 2016

1 commit

  • Demoting simple flow steering rule priority (for DPDK) was achieved by
    wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF
    as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper
    function for the PF and all VFs.

    In function mlx4_ib_create_flow(), this change caused the main rule
    creation for the PF to be wrapped, while it left the associated
    tunnel steering rule creation unwrapped for the PF.

    This mismatch caused rule deletion failures in mlx4_ib_destroy_flow()
    for the PF when the detach wrapper function did not find the associated
    tunnel-steering rule (since creation of that rule for the PF did not
    go through the wrapper function).

    Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native"
    (so that the PF invocation does not go through the wrapper), and perform
    the required priority demotion for the PF in the mlx4_ib_create_flow()
    code path.

    Fixes: 48564135cba8 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules")
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Jack Morgenstein
     

25 Dec, 2016

1 commit


29 Nov, 2016

1 commit

  • This reverts commit 9d76931180557270796f9631e2c79b9c7bb3c9fb.

    Using unregister_netdev at shutdown flow prevents calling
    the netdev's ndos or trying to access its freed resources.

    This fixes crashes like the following:
    Call Trace:
    [] dev_get_phys_port_id+0x1e/0x30
    [] rtnl_fill_ifinfo+0x4be/0xff0
    [] rtmsg_ifinfo_build_skb+0x73/0xe0
    [] rtmsg_ifinfo.part.27+0x16/0x50
    [] rtmsg_ifinfo+0x18/0x20
    [] netdev_state_change+0x46/0x50
    [] linkwatch_do_dev+0x38/0x50
    [] __linkwatch_run_queue+0xf5/0x170
    [] linkwatch_event+0x25/0x30
    [] process_one_work+0x152/0x400
    [] worker_thread+0x125/0x4b0
    [] ? rescuer_thread+0x350/0x350
    [] kthread+0xca/0xe0
    [] ? kthread_park+0x60/0x60
    [] ret_from_fork+0x25/0x30

    Fixes: 9d7693118055 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow")
    Signed-off-by: Tariq Toukan
    Reported-by: Sebastian Ott
    Reported-by: Steve Wise
    Cc: Jiri Pirko
    Signed-off-by: David S. Miller

    Tariq Toukan
     

30 Oct, 2016

1 commit

  • Currently interrupt test that is part of ethtool selftest runs the
    check over all interrupt vectors of the device.
    In mlx4_en package part of interrupt vectors are uninitialized since
    mlx4_ib doesn't exist. This causes NOP FW command to time out.
    Change logic to test current port interrupt vectors only.

    Signed-off-by: Eugenia Emantayev
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Eugenia Emantayev
     

10 Oct, 2016

1 commit

  • Pull main rdma updates from Doug Ledford:
    "This is the main pull request for the rdma stack this release. The
    code has been through 0day and I had it tagged for linux-next testing
    for a couple days.

    Summary:

    - updates to mlx5

    - updates to mlx4 (two conflicts, both minor and easily resolved)

    - updates to iw_cxgb4 (one conflict, not so obvious to resolve,
    proper resolution is to keep the code in cxgb4_main.c as it is in
    Linus' tree as attach_uld was refactored and moved into
    cxgb4_uld.c)

    - improvements to uAPI (moved vendor specific API elements to uAPI
    area)

    - add hns-roce driver and hns and hns-roce ACPI reset support

    - conversion of all rdma code away from deprecated
    create_singlethread_workqueue

    - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
    staging)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
    staging/lustre: Disable InfiniBand support
    iw_cxgb4: add fast-path for small REG_MR operations
    cxgb4: advertise support for FR_NSMR_TPTE_WR
    IB/core: correctly handle rdma_rw_init_mrs() failure
    IB/srp: Fix infinite loop when FMR sg[0].offset != 0
    IB/srp: Remove an unused argument
    IB/core: Improve ib_map_mr_sg() documentation
    IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
    IB/mthca: Move user vendor structures
    IB/nes: Move user vendor structures
    IB/ocrdma: Move user vendor structures
    IB/mlx4: Move user vendor structures
    IB/cxgb4: Move user vendor structures
    IB/cxgb3: Move user vendor structures
    IB/mlx5: Move and decouple user vendor structures
    IB/{core,hw}: Add constant for node_desc
    ipoib: Make ipoib_warn ratelimited
    IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
    IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
    IB/ipoib: Remove deprecated create_singlethread_workqueue
    ...

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • In MLX qp packets, the LRH (built by the driver) has both a VL field
    and an SL field. When building a QP1 packet, the VL field should
    reflect the SLtoVL mapping and not arbitrarily contain zero (as is
    done now). This bug causes credit problems in IB switches at
    high rates of QP1 packets.

    The fix is to cache the SL to VL mapping in the driver, and look up
    the VL mapped to the SL provided in the send request when sending
    QP1 packets.

    For FW versions which support generating a port_management_config_change
    event with subtype sl-to-vl-table-change, the driver uses that event
    to update its sl-to-vl mapping cache. Otherwise, the driver snoops
    incoming SMP mads to update the cache.

    There remains the case where the FW is running in secure-host mode
    (so no QP0 packets are delivered to the driver), and the FW does not
    generate the sl2vl mapping change event. To support this case, the
    driver updates (via querying the FW) its sl2vl mapping cache when
    running in secure-host mode when it receives either a Port Up event
    or a client-reregister event (where the port is still up, but there
    may have been an opensm failover).
    OpenSM modifies the sl2vl mapping before Port Up and Client-reregister
    events occur, so if there is a mapping change the driver's cache will
    be properly updated.

    Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Jack Morgenstein
     

24 Sep, 2016

2 commits

  • Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan
    protocol to 802.1ad.
    VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while
    the CTAG is set by the VF.
    Read current vlan protocol as part of the vf configuration state.

    Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake
    to verify that both the vf and the pf driver version support it.
    The handshake uses the command QUERY_FUNC_CAP:
    - The vf sets a pre-defined support bit in input modifier.
    - A pf that supports the feature sends the request to the vf through a
    pre-defined field in the output mailbox.
    - In case vf does not support the feature, the pf will fail the control
    command (in this case, IP link tool command to set the vf vlan
    protocol to 802.1ad).

    No change in VST 802.1Q mode.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Moshe Shemesh
     
  • Check device capability to support VF vlan protocol 802.1ad mode.
    Add vport attribute vlan protocol.
    Init vport vlan protocol by default to 802.1Q.
    Add update QP support for VF vlan protocol 802.1ad.
    Add func capability vlan_offload_disable to disable all
    vlan HW acceleration on VF while the VF is set to VF vlan protocol
    802.1ad mode.
    No change in VF vlan protocol 802.1Q (VST) mode.

    Signed-off-by: Moshe Shemesh
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Moshe Shemesh
     

05 Aug, 2016

1 commit

  • Pull base rdma updates from Doug Ledford:
    "Round one of 4.8 code: while this is mostly normal, there is a new
    driver in here (the driver was hosted outside the kernel for several
    years and is actually a fairly mature and well coded driver). It
    amounts to 13,000 of the 16,000 lines of added code in here.

    Summary:

    - Updates/fixes for iw_cxgb4 driver
    - Updates/fixes for mlx5 driver
    - Add flow steering and RSS API
    - Add hardware stats to mlx4 and mlx5 drivers
    - Add firmware version API for RDMA driver use
    - Add the rxe driver (this is a software RoCE driver that makes any
    Ethernet device a RoCE device)
    - Fixes for i40iw driver
    - Support for send only multicast joins in the cma layer
    - Other minor fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
    Soft RoCE driver
    IB/core: Support for CMA multicast join flags
    IB/sa: Add cached attribute containing SM information to SA port
    IB/uverbs: Fix race between uverbs_close and remove_one
    IB/mthca: Clean up error unwind flow in mthca_reset()
    IB/mthca: NULL arg to pci_dev_put is OK
    IB/hfi1: NULL arg to sc_return_credits is OK
    IB/mlx4: Add diagnostic hardware counters
    net/mlx4: Query performance and diagnostics counters
    net/mlx4: Add diagnostic counters capability bit
    Use smaller 512 byte messages for portmapper messages
    IB/ipoib: Report SG feature regardless of HW UD CSUM capability
    IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
    IB/hfi1: Disable by default
    IB/rdmavt: Disable by default
    IB/mlx5: Fix port counter ID association to QP offset
    IB/mlx5: Fix iteration overrun in GSI qps
    i40iw: Add NULL check for puda buffer
    i40iw: Change dup_ack_thresh to u8
    i40iw: Remove unnecessary check for moving CQ head
    ...

    Linus Torvalds
     

04 Aug, 2016

3 commits

  • Expose IB diagnostic hardware counters.
    The counters count IB events and are applicable for IB and RoCE.

    The counters can be divided into two groups, per device and per port.
    Device counters are always exposed.
    Port counters are exposed only if the firmware supports per port counters.

    rq_num_dup and sq_num_to are only exposed if we have firmware support
    for them, if we do, we expose them per device and per port.
    rq_num_udsdprd and num_cqovf are device only counters.

    rq - denotes responder.
    sq - denotes requester.

    |-----------------------|---------------------------------------|
    | Name | Description |
    |-----------------------|---------------------------------------|
    |rq_num_lle | Number of local length errors |
    |-----------------------|---------------------------------------|
    |sq_num_lle | number of local length errors |
    |-----------------------|---------------------------------------|
    |rq_num_lqpoe | Number of local QP operation errors |
    |-----------------------|---------------------------------------|
    |sq_num_lqpoe | Number of local QP operation errors |
    |-----------------------|---------------------------------------|
    |rq_num_lpe | Number of local protection errors |
    |-----------------------|---------------------------------------|
    |sq_num_lpe | Number of local protection errors |
    |-----------------------|---------------------------------------|
    |rq_num_wrfe | Number of CQEs with error |
    |-----------------------|---------------------------------------|
    |sq_num_wrfe | Number of CQEs with error |
    |-----------------------|---------------------------------------|
    |sq_num_mwbe | Number of Memory Window bind errors |
    |-----------------------|---------------------------------------|
    |sq_num_bre | Number of bad response errors |
    |-----------------------|---------------------------------------|
    |sq_num_rire | Number of Remote Invalid request |
    | | errors |
    |-----------------------|---------------------------------------|
    |rq_num_rire | Number of Remote Invalid request |
    | | errors |
    |-----------------------|---------------------------------------|
    |sq_num_rae | Number of remote access errors |
    |-----------------------|---------------------------------------|
    |rq_num_rae | Number of remote access errors |
    |-----------------------|---------------------------------------|
    |sq_num_roe | Number of remote operation errors |
    |-----------------------|---------------------------------------|
    |sq_num_tree | Number of transport retries exceeded |
    | | errors |
    |-----------------------|---------------------------------------|
    |sq_num_rree | Number of RNR NAK retries exceeded |
    | | errors |
    |-----------------------|---------------------------------------|
    |rq_num_rnr | Number of RNR NAKs sent |
    |-----------------------|---------------------------------------|
    |sq_num_rnr | Number of RNR NAKs received |
    |-----------------------|---------------------------------------|
    |rq_num_oos | Number of Out of Sequence requests |
    | | received |
    |-----------------------|---------------------------------------|
    |sq_num_oos | Number of Out of Sequence NAKs |
    | | received |
    |-----------------------|---------------------------------------|
    |rq_num_udsdprd | Number of UD packets silently |
    | | discarded on the Receive Queue due to |
    | | lack of receive descriptor |
    |-----------------------|---------------------------------------|
    |rq_num_dup | Number of duplicate requests received |
    |-----------------------|---------------------------------------|
    |sq_num_to | Number of time out received |
    |-----------------------|---------------------------------------|
    |num_cqovf | Number of CQ overflows |
    |-----------------------|---------------------------------------|

    Signed-off-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • Add a function to query diagnostics counters from the firmware.

    Signed-off-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Mark Bloch
     
  • Add a bit that indicates if the firmware supports per port
    diagnostic counters.

    Signed-off-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Mark Bloch
     

20 Jul, 2016

1 commit


30 Jun, 2016

1 commit


24 Jun, 2016

1 commit


23 Jun, 2016

1 commit

  • This allows a clean shutdown, even if some netdev clients do not
    release their reference from this netdev. It is enough to release
    the HW resources only as the kernel is shutting down.

    Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Eran Ben Elisha
     

06 May, 2016

1 commit

  • The dma_alloc_coherent() function returns a virtual address which can
    be used for coherent access to the underlying memory. On some
    architectures, like arm64, undefined behavior results if this memory is
    also accessed via virtual mappings that are not coherent. Because of
    their undefined nature, operations like virt_to_page() return garbage
    when passed virtual addresses obtained from dma_alloc_coherent(). Any
    subsequent mappings via vmap() of the garbage page values are unusable
    and result in bad things like bus errors (synchronous aborts in ARM64
    speak).

    The mlx4 driver contains code that does the equivalent of:
    vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
    device is opened.

    Prevent Ethernet driver to run this problematic code by forcing it to
    allocate contiguous memory. As for the Infiniband driver, at first we
    are trying to allocate contiguous memory, but in case of failure roll
    back to work with fragmented memory.

    Signed-off-by: Haggai Abramovsky
    Signed-off-by: Yishai Hadas
    Reported-by: David Daney
    Tested-by: Sinan Kaya
    Signed-off-by: David S. Miller

    Haggai Abramovsky
     

22 Apr, 2016

1 commit

  • Maintain the PCI status and provide wrappers for enabling and disabling
    the PCI device. Performing the actions more than once without doing
    its opposite results in warning logs.

    This occurred when EEH hotplugged the device causing a warning for
    disabling an already disabled device.

    Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
    Signed-off-by: Daniel Jurgens
    Signed-off-by: Yishai Hadas
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Daniel Jurgens
     

20 Mar, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

02 Mar, 2016

1 commit

  • Implement newly introduced devlink interface. Add devlink port instances
    for every port and set the port types accordingly.

    Signed-off-by: Jiri Pirko
    v2->v3:
    -add dev param to devlink_register (api change)
    Signed-off-by: David S. Miller

    Jiri Pirko
     

01 Mar, 2016

1 commit

  • Add support for receiving multicast/unicast traffic with
    the don't trap rule.

    Sniffing these packets requires a flow steering rule of type NORMAL
    at priority 0 with flag IB_FLOW_ATTR_FLAGS_DONT_TRAP set.
    Choosing between multicast or unicast is done via ethernet L2 dest_mac
    mask and value:
    - If mask is all zeros - unicast and multicast are set.
    - If mask non zero - only mask with multicast bit 1 and rest 0 is
    supported, the mac value will choose if it is
    multicast or unicast rule.

    If the mask multicast bit is on and some other bits are on too, it means
    a request for specific multicast or unicast, this is not supported,
    either receive all multicast or all unicast.

    Only when limitations are met registered QP will receive requested type
    but other QPs can receive same traffic if registered for it.
    Otherwise, if limitations are not met, an error will be returned.

    Limitations:
    - Rule must be with priority 0.
    - A0 mode is not supported.
    - Sniffer QP cannot appear in any other flow steering rule.

    Signed-off-by: Marina Varshaver
    Reviewed-by: Matan Barak
    Reviewed-by: Yishai Hadas
    Signed-off-by: Doug Ledford

    Marina Varshaver
     

17 Feb, 2016

1 commit

  • problem description:

    The current code sets UAR page size equal to system page size.
    The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
    The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.

    solution:

    Always set UAR page to 4KB. This allows more UAR pages if the OS
    has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
    system page size, with 4MB uar region, there are 4MB/2/64KB = 32
    uars (half for uar, half for blueflame). This does not meet minimum 128
    UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
    which meet the minimum requirement.

    Note that only codes in mlx4_core that deal with firmware know that uar
    page size is 4KB. Codes that deal with usr page in cq and qp context
    (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
    that uar page size equals to system page size.

    Note that with this implementation, on 64KB system page size kernel, there
    are 16 uars per system page but only one uars is used. The other 15
    uars are ignored because of the above assumption.

    Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
    to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
    firmware.

    Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
    the virtual OS must be updated. If hypervisor has old code, and the virtual
    OS has this new code, the new code will be backward compatible with the
    old code. If the uar size is big enough, this new code in VF continues to
    work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
    meet 128 uars requirement, this new code not loaded in VF and print the same
    error message as the old code in Hypervisor.

    Signed-off-by: Huy Nguyen
    Reviewed-by: Yishai Hadas
    Signed-off-by: David S. Miller

    Huy Nguyen
     

24 Jan, 2016

1 commit

  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.5 merge window patches

    - Remove usage of ib_query_device and instead store attributes in
    ib_device struct

    - Move iopoll out of block and into lib, rename to irqpoll, and use
    in several places in the rdma stack as our new completion queue
    polling library mechanism. Update the other block drivers that
    already used iopoll to use the new mechanism too.

    - Replace the per-entry GID table locks with a single GID table lock

    - IPoIB multicast cleanup

    - Cleanups to the IB MR facility

    - Add support for 64bit extended IB counters

    - Fix for netlink oops while parsing RDMA nl messages

    - RoCEv2 support for the core IB code

    - mlx4 RoCEv2 support

    - mlx5 RoCEv2 support

    - Cross Channel support for mlx5

    - Timestamp support for mlx5

    - Atomic support for mlx5

    - Raw QP support for mlx5

    - MAINTAINERS update for mlx4/mlx5

    - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

    - Add support for remote invalidate to the iSER driver (pushed
    through the RDMA tree due to dependencies, acknowledged by nab)

    - Update to NFSoRDMA (pushed through the RDMA tree due to
    dependencies, acknowledged by Bruce)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
    IB/mlx5: Unify CQ create flags check
    IB/mlx5: Expose Raw Packet QP to user space consumers
    {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
    IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
    IB/mlx5: Add Raw Packet QP query functionality
    IB/mlx5: Add create and destroy functionality for Raw Packet QP
    IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
    IB/mlx5: Allocate a Transport Domain for each ucontext
    net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
    net/mlx5_core: Add RQ and SQ event handling
    net/mlx5_core: Export transport objects
    IB/mlx5: Expose CQE version to user-space
    IB/mlx5: Add CQE version 1 support to user QPs and SRQs
    IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
    IB/sa: Fix netlink local service GFP crash
    IB/srpt: Remove redundant wc array
    IB/qib: Improve ipoib UD performance
    IB/mlx4: Advertise RoCE v2 support
    IB/mlx4: Create and use another QP1 for RoCEv2
    IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
    ...

    Linus Torvalds
     

20 Jan, 2016

2 commits