02 Jul, 2019

13 commits

  • Add two ptp_ops: init and fini, to initialize and finalize the PTP
    subsystem. Call as appropriate from mlxsw_sp_init() and _fini().

    Lay the groundwork for Spectrum-1 support. On Spectrum-1, the received
    timestamped packets and their corresponding timestamps arrive
    independently, and need to be matched up. Introduce the related data types
    and add to struct mlxsw_sp_ptp_state the hash table that will keep the
    unmatched entries.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • On Spectrum-1, timestamps are delivered separately from the packets, and
    need to paired up. Therefore, at some point after mlxsw_sp_port_xmit()
    is invoked, it is necessary to involve the chip-specific driver code to
    allow it to do the necessary bookkeeping and matching.

    On Spectrum-2, timestamps are delivered in CQE. For that reason,
    position the point of driver involvement into mlxsw_pci_cqe_sdq_handle()
    to make it hopefully easier to extend for Spectrum-2 in the future.

    To tell the driver what port the packet was sent on, keep tx_info
    in SKB control buffer.

    Introduce a new driver core interface mlxsw_core_ptp_transmitted(), a
    driver callback ptp_transmitted, and a PTP op transmitted. The callee is
    responsible for taking care of releasing the SKB passed to the new
    interfaces, and correspondingly have the new stub callbacks just call
    dev_kfree_skb_any().

    Follow-up patches will introduce the actual content into
    mlxsw_sp1_ptp_transmitted() in particular.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The SKB control buffer is useful (and used) for bookkeeping of information
    related to that SKB. Add helpers so that the mlxsw driver(s) can safely use
    the buffer as well. The structure is currently empty, individual users will
    add members to it as necessary.

    Note that SKB allocation functions already clear the buffer, so the cleanup
    is only necessary when ndo_start_xmit is called.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • When configured, the Spectrum hardware can recognize PTP packets and
    trap them to the CPU using dedicated traps, PTP0 and PTP1.

    One reason to get PTP packets under dedicated traps is to have a
    separate policer suitable for the amount of PTP traffic expected when
    switch is operated as a boundary clock. For this, add two new trap
    groups, MLXSW_REG_HTGT_TRAP_GROUP_SP_PTP0 and _PTP1, and associate the
    two PTP traps with these two groups.

    In the driver, specifically for Spectrum-1, event PTP packets will need
    to be paired up with their timestamps. Those arrive through a different
    set of traps, added later in the patch set. To support this future use,
    introduce a new PTP op, ptp_receive.

    It is possible to configure which PTP messages should be trapped under
    which PTP trap. On Spectrum systems, we will use PTP0 for event
    packets (which need timestamping), and PTP1 for control packets (which
    do not). Thus configure PTP0 trap with a custom callback that defers to
    the ptp_receive op.

    Additionally, L2 PTP packets are actually trapped through the LLDP trap,
    not through any of the PTP traps. So treat the LLDP trap the same way as
    the PTP0 trap. Unlike PTP traps, which are currently still disabled,
    LLDP trap is active. Correspondingly, have all the implementations of
    the ptp_receive op return true, which the handler treats as a signal to
    forward the packet immediately.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • On Spectrum-1, timestamps for PTP packets are delivered through queues
    of ingress and egress timestamps. There are two event traps
    corresponding to activity on each of those queues. This mechanism is
    absent on Spectrum-2, and therefore the traps should only be registered
    on Spectrum-1.

    Carry a chip-specific listener array in mlxsw_sp->listeners and
    listeners_count. Register listeners from that array in
    mlxsw_sp_traps_init(). Add a new listener array for Spectrum-1 traps and
    configure the newly-added mlxsw_sp->listeners with this array.

    The listener array is empty for now, the events will be added in a later
    patch.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • On Spectrum-1, timestamps for PTP packets are delivered through queues
    of ingress and egress timestamps. There are two event traps
    corresponding to activity on each of those queues. This mechanism is
    absent on Spectrum-2, and therefore the traps should only be registered
    on Spectrum-1.

    Extract out of mlxsw_sp_traps_init() a generic helper,
    mlxsw_sp_traps_register(), and likewise with _unregister(). The new helpers
    will later be called with Spectrum-1-specific traps.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • This register serves to configure global parameters of certain
    monitoring operations. The following patches will use it to configure
    that when PTP timestamps are delivered through the PTP FIFO traps, the
    FIFO in question is cleared as well.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The MTPPTR is used for reading the per port PTP timestamp FIFO.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • This register is used for configuring under which trap to deliver PTP
    packets depending on type of the packet.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • This register serves for configuration of which PTP messages should be
    timestamped. This is a global configuration, despite the register name.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Currently, kernel pktgen has the feature to specify udp destination port
    for sending packet. (e.g. pgset "udp_dst_min 9")

    But on samples, each of the scripts doesn't have any option to achieve this.

    This commit adds the DST_PORT option to specify the target port(s) in the script.

    -p : ($DST_PORT) destination PORT range (e.g. 433-444) is also allowed

    Signed-off-by: Daniel T. Lee
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel T. Lee
     
  • This commit adds port parsing and port validate helper function to parse
    single or range of port(s) from a given string. (e.g. 1234, 443-444)

    Helpers will be used in prior to set target port(s) in samples/pktgen.

    Signed-off-by: Daniel T. Lee
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel T. Lee
     
  • Extend flowlabel_reflect bitmask to allow conditional
    reflection of incoming flowlabels in echo replies.

    Note this has precedence against auto flowlabels.

    Add flowlabel_reflect enum to replace hard coded
    values.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Jul, 2019

2 commits

  • Saeed Mahameed says:

    ====================
    mlx5e-updates-2019-06-28

    This series adds some misc updates for mlx5e driver

    1) Allow adding the same mac more than once in MPFS table
    2) Move to HW checksumming advertising
    3) Report netdevice MPLS features
    4) Correct physical port name of the PF representor
    5) Reduce stack usage in mlx5_eswitch_termtbl_create
    6) Refresh TIR improvement for representors
    7) Expose same physical switch_id for all representors
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Jeff Kirsher says:

    ====================
    Intel Wired LAN Driver Updates 2019-06-28

    This series contains a smorgasbord of updates to many of the Intel
    drivers.

    Gustavo A. R. Silva updates the ice and iavf drivers to use the
    strcut_size() helper where possible.

    Miguel increases the pause and refresh time for flow control in the
    e1000e driver during reset for certain devices.

    Dann Frazier fixes a potential NULL pointer dereference in ixgbe driver
    when using non-IPSec enabled devices.

    Colin Ian King fixes a potential overflow during a shift in the ixgbe
    driver. Also fixes a potential NULL pointer dereference in the iavf
    driver by adding a check.

    Venkatesh Srinivas converts the e1000 driver to use dma_wmb() instead of
    wmb() for doorbell writes to avoid SFENCEs in the transmit and receive
    paths.

    Arjan updates the e1000e driver to improve boot time by over 100 msec by
    reducing the usleep ranges suring system startup.

    Artem updates the igb driver register dump in ethtool, first prepares
    the register dump for future additions of registers in the dump, then
    secondly, adds the RR2DCDELAY register to the dump. When dealing with
    time-sensitive networks, this register is helpful in determining your
    latency from the device to the ring.

    Alex fixes the ixgbevf driver to use the current cached link state,
    rather than trying to re-check the value from the PF.

    Harshitha adds support for MACVLAN offloads in i40e by using channels as
    MACVLAN interfaces.

    Detlev Casanova updates the e1000e driver to use delayed work instead of
    timers to run the watchdog.

    Vitaly fixes an issue in e1000e, where when disconnecting and
    reconnecting the physical cable connection, the NIC enters a DMoff
    state. This state causes a mismatch in link and duplexing, so check the
    PCIm function state and perform a PHY reset when in this state to
    resolve the issue.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Jun, 2019

10 commits

  • DMA_API_HOWTO.txt includes an example explaining when
    dma_sync_single_for_device() is not needed, and that example matches
    our use case. The buffer isn't changed by the CPU and direction is
    DMA_FROM_DEVICE, so we can remove the call to
    dma_sync_single_for_device().

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     
  • Documentation/DMA-API-HOWTO.txt states:
    By default, the kernel assumes that your device can address 32-bits of
    DMA addressing. For a 64-bit capable device, this needs to be increased,
    and for a device with limitations, it needs to be decreased.

    Therefore we don't need the 32 Bit DMA fallback configuration and can
    remove it.

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     
  • The VLAN tag is stored in the descriptor in network byte order.
    Using swab16 works on little endian host systems only. Better play safe
    and use ntohs or htons respectively.

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     
  • running the script on systems without netdevsim now prints:

    SKIP: ipsec_offload can't load netdevsim

    instead of error message & failed status.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Nikolay Aleksandrov says:

    ====================
    em_ipt: add support for addrtype

    We would like to be able to use the addrtype from tc for ACL rules and
    em_ipt seems the best place to add support for the already existing xt
    match. The biggest issue is that addrtype revision 1 (with ipv6 support)
    is NFPROTO_UNSPEC and currently em_ipt can't differentiate between v4/v6
    if such xt match is used because it passes the match's family instead of
    the packet one. The first 3 patches make em_ipt match only on IP
    traffic (currently both policy and addrtype recognize such traffic
    only) and make it pass the actual packet's protocol instead of the xt
    match family when it's unspecified. They also add support for NFPROTO_UNSPEC
    xt matches. The last patch allows to add addrtype rules via em_ipt.
    We need to keep the user-specified nfproto for dumping in order to be
    compatible with libxtables, we cannot dump NFPROTO_UNSPEC as the nfproto
    or we'll get an error from libxtables, thus the nfproto is limited to
    ipv4/ipv6 in patch 03 and is recorded.

    v3: don't use the user nfproto for matching, only for dumping, more
    information is available in the commit message in patch 03
    v2: change patch 02 to set the nfproto only when unspecified and drop
    patch 04 from v1 (Eyal Birger)
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Allow em_ipt to use addrtype for matching. Restrict the use only to
    revision 1 which has IPv6 support. Since it's a NFPROTO_UNSPEC xt match
    we use the user-specified nfproto for matching, in case it's unspecified
    both v4/v6 will be matched by the rule.

    v2: no changes, was patch 5 in v1

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If we dump NFPROTO_UNSPEC as nfproto user-space libxtables can't handle
    it and would exit with an error like:
    "libxtables: unhandled NFPROTO in xtables_set_nfproto"
    In order to avoid the error return the user-specified nfproto. If we
    don't record it then the match family is used which can be
    NFPROTO_UNSPEC. Even if we add support to mask NFPROTO_UNSPEC in
    iproute2 we have to be compatible with older versions which would be
    also be allowed to add NFPROTO_UNSPEC matches (e.g. addrtype after the
    last patch).

    v3: don't use the user nfproto for matching, only for dumping the rule,
    also don't allow the nfproto to be unspecified (explained above)
    v2: adjust changes to missing patch, was patch 04 in v1

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Set the family based on the packet if it's unspecified otherwise
    protocol-neutral matches will have wrong information (e.g. NFPROTO_UNSPEC).
    In preparation for using NFPROTO_UNSPEC xt matches.

    v2: set the nfproto only when unspecified

    Suggested-by: Eyal Birger
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Restrict matching only to ip/ipv6 traffic and make sure we can use the
    headers, otherwise matches will be attempted on any protocol which can
    be unexpected by the xt matches. Currently policy supports only ipv4/6.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds vlan offload support for the HINIC driver.

    Signed-off-by: Xue Chaojing
    Signed-off-by: David S. Miller

    Xue Chaojing
     

29 Jun, 2019

15 commits

  • After changing the parent_id to be the same for both NICs of same
    the hardware device, netdev_port_same_parent_id now returns true for
    more cases (all the lower devices in the hierarchy are on the same
    hardware device).

    If merged eswitch isn't enabled, these cases aren't supported, so disallow
    them.

    Signed-off-by: Paul Blakey
    Reviewed-by: Roi Dayan
    Signed-off-by: Saeed Mahameed

    Paul Blakey
     
  • Report system_image_guid as the E-Switch switch_id, this ensures
    that when a NIC contains multiple PCI functions and which
    has merged eswitch capability, all representors from
    multiple PFs publish same switch_id.

    Signed-off-by: Paul Blakey
    Reviewed-by: Parav Pandit
    Reviewed-by: Roi Dayan
    Signed-off-by: Saeed Mahameed

    Paul Blakey
     
  • Refreshing TIRs is done in order to update the TIRs with the current
    state of SQs in the transport domain, so that the TIRs can filter out
    undesired self-loopback packets based on the source SQ of the packet.

    Representor TIRs will only receive packets that originate from their
    associated vport, due to dedicated steering, and therefore will never
    receive self-loopback packets, whose source vport will be the vport of
    the E-Switch manager, and therefore not the vport associated with the
    representor. As such, it is not necessary to refresh the representors'
    TIRs, since self-loopback packets can't reach them.

    Since representors only exist in switchdev mode, and there is no
    scenario in which a representor will exist in the transport domain
    alongside a non-representor, it is not necessary to refresh the
    transport domain's TIRs upon changing the state of a representor's
    queues. Therefore, do not refresh TIRs upon such a change. Achieve
    this by adding an update_rx callback to the mlx5e_profile, which
    refreshes TIRs for non-representors and does nothing for representors,
    and replace instances of mlx5e_refresh_tirs() upon changing the state
    of the queues with update_rx().

    Signed-off-by: Gavi Teitz
    Reviewed-by: Roi Dayan
    Reviewed-by: Tariq Toukan
    Signed-off-by: Saeed Mahameed

    Gavi Teitz
     
  • Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
    wasteful and causes a warning on 32-bit architectures when building
    with clang -fsanitize-coverage:

    drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
    drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

    Since the structure is never written to, we can statically allocate
    it to avoid the stack usage. To be on the safe side, mark all
    subsequent function arguments that we pass it into as 'const'
    as well.

    Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions")
    Signed-off-by: Arnd Bergmann
    Acked-by: Saeed Mahameed
    Acked-by: Mark Bloch
    Signed-off-by: Saeed Mahameed

    Arnd Bergmann
     
  • Consider PCI and non PCI device types while setting device name
    in get_drvinfo() callback using existing generic device.

    Signed-off-by: Parav Pandit
    Reviewed-by: Vu Pham
    Signed-off-by: Saeed Mahameed

    Parav Pandit
     
  • Currently PF phys_port_name is named as pfNvf-1 as vport number for PF
    vport is 65535.
    Correct PF's phys_port name as agreed upon name as pfN.

    Signed-off-by: Parav Pandit
    Reviewed-by: Vu Pham
    Signed-off-by: Saeed Mahameed

    Parav Pandit
     
  • Set supported device features in the netdevice MPLS features mask.
    This will enable HW checksumming and TSO for MPLS tagged traffic.

    Signed-off-by: Ariel Levkovich
    Signed-off-by: Saeed Mahameed

    Ariel Levkovich
     
  • This patch changes the way the driver advertises its checksum offload
    capabilities within the net device features bit mask.

    Instead of advertising protocol specific checksumming capabilities
    which are limited today to IPv4 and IPv6, we move to reporing
    generic HW checksumming capabilities.

    This will allow the network stack to let mlx5 device offload checksum
    for cases where the IP header is encapsulated within another protocol
    and the skb->protocol doesn't indicate one of the IP versions protocol,
    specifically in the case of MPLS label encapsulating the IP header and
    the skb->protocol indiciates MPLS ethertype rather than IP.

    Moving the HW_CSUM reporting is required in the basic net device hw
    features mask and also in the extensions (vlan and encpasulation
    features) since the extensions are always multiplied by the basic
    features set during the packet's traversal through the stack's tx flow.

    Signed-off-by: Ariel Levkovich
    Signed-off-by: Saeed Mahameed

    Ariel Levkovich
     
  • Remove the limitation preventing adding a vport's MAC address to the
    Multi-Physical Function Switch (MPFS) more than once per E-switch, as
    there is no difference in the MPFS if an address is being used by an
    E-switch more than once.

    This allows the E-switch to have multiple vports with the same MAC
    address, allowing vports to be classified by VLAN id instead of by MAC
    if desired.

    Signed-off-by: Gavi Teitz
    Signed-off-by: Saeed Mahameed

    Gavi Teitz
     
  • Unify and isolate the error handling flow in mlx5_mpfs_add_mac(),
    removing code duplication.

    Signed-off-by: Gavi Teitz
    Signed-off-by: Saeed Mahameed

    Gavi Teitz
     
  • Misc updates from mlx5-next branch:

    1) E-Switch vport metadata support for source vport matching
    2) Convert mkey_table to XArray
    3) Shared IRQs and to use single IRQ for all async EQs

    Signed-off-by: Saeed Mahameed

    Saeed Mahameed
     
  • Due to commit: 5d8682588605 ("[misc] mei: me: allow runtime
    pm for platform with D0i3")
    When disconnecting the cable and reconnecting it the NIC
    enters DMoff state. This caused wrong link indication
    and duplex mismatch. This bug is described in:
    https://bugzilla.redhat.com/show_bug.cgi?id=1689436

    Checking PCIm function state and performing PHY reset after a
    timeout in watchdog task solves this issue.

    Signed-off-by: Vitaly Lifshits
    Acked-by: Sasha Neftin
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Vitaly Lifshits
     
  • Use delayed work instead of timers to run the watchdog of the e1000e
    driver.

    Simplify the code with one less middle function.

    Signed-off-by: Detlev Casanova
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Detlev Casanova
     
  • This patch enables macvlan offloads for i40e. The idea is to use
    channels as macvlan interfaces. The channels are VSIs of
    type VMDQ. When the first macvlan is created, the maximum number of
    channels possible are created. From then on, as a macvlan interface
    is created, a macvlan filter is added to these already created
    channels (VSIs).

    This patch utilizes subordinate device traffic classes to make queue
    groups(channels) available for an upper device like a macvlan.

    Steps to configure macvlan offloads:
    1. ethtool -K ethx l2-fwd-offload on
    2. ip link add link ethx name macvlan1 type macvlan
    3. ip addr add

    dev macvlan1
    4. ip link set macvlan1 up

    Signed-off-by: Harshitha Ramamurthy
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher

    Harshitha Ramamurthy
     
  • Change the ethtool link settings call to just read the cached state out of
    the adapter structure instead of trying to recheck the value from the PF.
    Doing this should prevent excessive reading of the mailbox.

    Signed-off-by: Alexander Duyck
    Reviewed-by: "Guilherme G. Piccoli"
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher

    Alexander Duyck