30 Jun, 2017

18 commits

  • Checks are added to the existing sockex3 and test_map_in_map test.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on
    BPF_MAP_TYPE_PROG_ARRAY,
    BPF_MAP_TYPE_ARRAY_OF_MAPS and
    BPF_MAP_TYPE_HASH_OF_MAPS.

    The lookup returns a prog-id or map-id to the userspace.
    The userspace can then use the BPF_PROG_GET_FD_BY_ID
    or BPF_MAP_GET_FD_BY_ID to get a fd.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • Version 3.70a of the Designware has additional DMA registers so
    add those to the ethtool DMA Register dump.
    Offset 9 - Receive Interrupt Watchdog Timer Register
    Offset 10 - AXI Bus Mode Register
    Offset 11 - AHB or AXI Status Register
    Offset 22 - HW Feature Register

    Signed-off-by: Thor Thayer
    Acked-by: Giuseppe Cavallaro
    Signed-off-by: David S. Miller

    Thor Thayer
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2017-06-27 (Innova IPsec offload support)

    This patchset adds support for Innova IPSec network interface card.

    About Innova device:
    --------------------
    Innova is a network card with a ConnectX chip and an FPGA chip as a
    bump-on-the-wire.

    Internal
    +----------+ Link +-----------------+
    | +--------------+ FPGA | +------+
    | ConnectX | | Shell +--+ QSFP |
    | +--------------+ +-------+ | | Port |
    +----------+ I2C | | SBU | | +------+
    | +-------+ |
    +--+----------+---+
    | |
    +--+--+ +---+---+
    | DDR | | Flash |
    +-----+ +-------+

    The FPGA synthesized logic is loaded from dedicated flash storage and has
    access to its own dedicated DDR RAM.
    The ConnectX chip firmware programs the FPGA by accessing its configuration
    space over either the slow internal I2C link or the high-speed internal link.

    The FPGA logic is divided into a "Shell" and a "Sandbox Unit" (SBU).
    mlx5_core driver (with CONFIG_MLX5_FPGA) handles all shell functionality,
    while other components may handle the various SBU functionalities.

    The driver opens high-speed reliable communication channels with the shell and
    the SBU over the internal link.
    These channels may be used for high-bandwidth configuration or for SBU-specific
    out-of-band data paths.

    About Innova IPSec device:
    --------------------------
    Innova IPSec is a network card that allows offloading IPSec cryptography operations
    from the host CPU to the NIC. It is an Innova card with an IPSec SBU.
    The hardware keeps the database of IPSec Security Associations (SADB) in the FPGA's
    DDR memory.

    Internal
    +----------+ Link +-----------------+
    | +--------------+ FPGA | +------+
    | ConnectX | | Shell +--+ QSFP |
    | +--------------+ +-------+ | | Port |
    +----------+ Internal I2C | | IPSec | | +------+
    | | SBU | |
    | +-------+ |
    +--+----------+---+
    | |
    +--+--+ +---+---+
    | DDR | | |
    | | | Flash |
    |SADB | | |
    +-----+ +-------+

    Modes and ciphers:
    Currently the following modes and ciphers are supported:
    IPv4 and IPv6
    ESP tunnel and transport modes
    AES 128 and 256 bit encryption, with GCM authentication (RFC4106)

    IV is generated using seqiv, in sync with Linux's geniv.

    More modes and ciphers may be added later.

    Notes:
    In the future similar functionality will be included in a single-chip NIC.

    About the driver:
    -----------------
    Patches 1-4 prepare some existing driver code for the new feature:
    * Add support for reserved GIDs in the hardware GID table
    * Allow multiple modules to enable hardware RoCE support independently
    Patches 5-6 define structs and helper functions for QP work-queues.
    Patches 7-11 add various FPGA-related features required for Innova.
    IPSec.
    Patch 12 adds abstraction layer for Mellanox IPSec-offload capable devices.
    atches 13-16 add IPSec offload support to the mlx5 netdevice.

    This driver services the new IPSec offload API introduced in commit
    d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")

    Configuration Path:
    If Innova IPSec device is detected, the mlx5e netdevice gets the new
    NETIF_F_HW_ESP feature and the xdo callbacks, indicating ESP offload
    capabilities, and also the matching TX checksum and GSO features.

    The driver configures offloaded Security Associations (SAs) by sending
    an ADD_SA or DEL_SA message to the IPSec SBU, which updates the SADB in DDR.
    These messages and their responses are sent over a high-speed channel.
    Counters for ethtool are retrieved by the driver from the SBU.

    Data path:
    On receive path, the SBU decrypts ESP packets which match the offloaded SADB,
    but keeps them encapsulated.
    The SBU injects metadata (Mellanox owned ethertype) indicating that crypto-offload
    has taken place, the SA with which it was done, and the authentication result.

    The ConnectX chip performs RX checksum offload on the packet, and RSS using the
    ESP SPI value. The driver detects the special ethertype, and attaches a struct
    secpath to the RX SKB, including flags to indicate that crypto offload took place,
    the authentication result, and which xfrm_state was used for decryption, in the
    olen and ovec members. The RX SKB may have useful CHECKSUM_COMPLETE. A separate
    patchset will add support for that in the xfrm stack.

    On transmit path, the stack encapsulates the packet but does not encrypt it, and
    indicates in the SKB's secpath that crypto offload is to be performed and the SA
    to use to do so.
    The driver avoids performing crypto-offload for ESP fragments, and packets with
    IP options, as the SBU cannot currently do that. For eligible packets, the driver
    prepends a special ethertype with metadata instructing the hardware to perform crypto offload.
    The stack builds regular (non-GSO) SKBs so that they contain a placeholder for the ESP trailer.
    The driver trims it off, because the SBU automatically appends the trailer for offloaded packets.
    The ConnectX chip performs TX checksum offload on inner UDP or TCP packets,
    and GSO for TCP packets (duplicating the prepended metadata).
    The segmented packets then undergo encryption in the SBU before going on the wire.

    Performance:
    We measure single stream of TCP on Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
    Using AES-NI with ESP GSO we get constant 4.1 Gbps.
    Using crypto offload we get constant 18 Gbps.

    Note that these numbers require CHECKSUM_COMPLETE support in XFRM, which we submit separately.

    - Ilan Tayari
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Ivan Khoronzhuk says:

    ====================
    net: fix sw timestamping for non PTP packets

    This series contains several corrections connected with timestamping
    for cpsw and netcp drivers based on same cpts module.

    Based on net/next
    ====================

    Reviewed-by: Grygorii Strashko
    Signed-off-by: David S. Miller

    David S. Miller
     
  • There is cpts function to check if packet can be timstamped with cpts.
    Seems that ptp_classify_raw cover all cases listed with "case".

    Signed-off-by: Ivan Khoronzhuk
    Signed-off-by: David S. Miller

    Ivan Khoronzhuk
     
  • The cpts can timestmap only ptp packets at this moment, so driver
    cannot mark every packet as though it's going to be timestamped,
    only because h/w timestamping for given skb is enabled with
    SKBTX_HW_TSTAMP. It doesn't allow to use sw timestamping, as result
    outgoing packet is not timestamped at all if it's not PTP and h/w
    timestamping is enabled. So, fix it by setting SKBTX_IN_PROGRESS
    only for PTP packets.

    Signed-off-by: Ivan Khoronzhuk
    Signed-off-by: David S. Miller

    Ivan Khoronzhuk
     
  • Move sw timestamp function close to channel submit function.

    Signed-off-by: Ivan Khoronzhuk
    Signed-off-by: David S. Miller

    Ivan Khoronzhuk
     
  • Using netdev_(netdev, "%s: ...", netdev->name) duplicates the
    name in the output. Remove those uses.

    Miscellanea:

    o Use the netif_ convenience macros at the same time

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Trivial fix to spelling mistake in mlx4_dbg debug message

    Signed-off-by: Colin Ian King
    Acked-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • Trivial fix to spelling mistake in netif_info message

    Signed-off-by: Colin Ian King
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • Since the PHY used is internal, simply set phy-mode as internal.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • Since the PHY used is internal, simply set phy-mode as internal.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • Since the PHY used is internal, simply set phy-mode as internal.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • Since the PHY used is internal, simply set phy-mode as internal.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • Since the PHY used is internal, simply set phy-mode as internal.

    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • The current way to find if the phy is internal is to compare DT phy-mode
    and emac_variant/internal_phy.
    But it will negate a possible future SoC where an external PHY use the
    same phy mode than the internal one.

    By using phy-mode = "internal" we permit to have an external PHY with
    the same mode than the internal one.

    Reported-by: André Przywara
    Signed-off-by: Corentin Labbe
    Signed-off-by: David S. Miller

    LABBE Corentin
     
  • The bond_options.c file contains multiple netdev_info statements that clutter kernel output.
    This patch replaces all netdev_info with netdev_dbg and adds a netdev_dbg statement for the
    packets per slave parameter. Also fixes misalignment at line 467.

    Suggested-by: Joe Perches
    Signed-off-by: Michael J Dilmore
    Signed-off-by: David S. Miller

    Michael Dilmore
     

28 Jun, 2017

21 commits

  • Jakub Kicinski says:

    ====================
    nfp: get_phys_port_name for representors and SR-IOV reorder

    This series starts by making the error message if FW cannot be located
    easier to understand. Then I move some functions from PCI probe files
    into library code (nfpcore) where they belong, and remove one function
    which is never used.

    Next few patches equip representors with nfp_port structure and make
    their NDOs fully shared (not defined in apps), thanks to which we can
    easily determine which netdevs are NFP's by comparing the NDO pointers.

    10th patch makes use of the shared NDOs and nfp_ports to deliver
    netdev-type independent .ndo_get_phys_port_name() implementation.

    Patches 11 and 12 reorder the nfp_app SR-IOV callbacks with enabling
    SR-IOV VFs. Unfortunately due to how PCI subsystem works we can't
    guarantee being able to disable SR-IOV at exit or that it will be
    disabled when we first probe... We must therefore make sure FW is
    able to deal with being loaded while SR-IOV is already on.

    Patch 13 fixes potential deadlock when enabling SR-IOV happens at
    the same time as port state refresh. Note that this can't happen
    at this point, since Flower doesn't refresh ports... but lockdep
    doesn't know about such details and we will have to deal with this
    sooner or later anyway.

    Last but not least a new Kconfig is added to make sure those who
    don't care about flower offloads have a way of not including the
    code in their kernels. Thanks to nfp_app separation this costs us
    a single ifdef and excluding flower files from the build.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Give users an option not to build the flower-offload related code.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Since we grab pf->lock around pci_enable_sriov() we can no longer
    safely queue work which may also grab that lock onto system workqueue.
    pci_enable_sriov() will flush system workqueue as part to wait for VF
    probing.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • We previously assumed that app callback can be guaranteed to be
    executed before SR-IOV is actually enabled. Given that we can't
    guarantee that SR-IOV will be disabled during probe or that we
    will be able to disable it on remove, we should reorder the callbacks.
    We should also call the app's sriov_enable if SR-IOV was enabled
    during probe.

    Application FW must be able to disable VFs internally and not depend
    on them being removed at PCIe level.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • We assumed that when we probe number of enabled VFs will be at 0.
    This doesn't have to be the case for example if previous driver left
    SR-IOV enabled due to some VFs being assigned. Read the number of VFs
    enabled. Fail probe if it's above current FWs limit.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Make nfp_port_get_phys_port_name() support new port types and
    wire it up to representors' struct net_device_ops.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Based on struct net_device_ops figure out if netdev is a nfp_repr.
    Use this knowledge to convert netdev directly to nfp_port.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Apps shouldn't declare their own struct net_device_ops for
    representors, this makes sharing code harder. Add necessary
    nfp_app callbacks and move the definition of representors'
    struct net_device_ops to common code.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Thanks to the fact that all representors will now have an nfp_port,
    we can depend on information there to provide a app-independent
    .ndo_get_stats64().

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • nfp_port is an abstraction which is supposed to allow us sharing
    code between different netdev types (vNIC vs repr). Spawn ports
    for PFs and VFs to enable this sharing.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Add a cleanup callback for undoing what app init callback did.
    Make flower allocate its private structure on init and free
    it from the new callback.

    While at it remember to set the app pointer to NULL on the
    error path to avoid any races while probe path unwinds.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Remove unused nfp_cpp_area_check_range() function.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Move most of the helper for mapping RTsyms from nfp_net_main.c
    to nfpcore. Use the new helper directly for mapping MAC statistics,
    since they don't need to include the PCIe interface ID in the symbol
    name.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • nfp_net_map_area() is a helper for mapping areas of NFP memory
    defined in nfp_net_main.c. Move it to nfpcore to allow reuse
    and rename accordingly. Create an additional helper -
    nfp_cpp_area_alloc_acquire() the opposite of already existing
    nfp_cpp_area_release_free().

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • We support application FW being either loaded automatically at
    boot from flash or (more commonly) by the driver from disk.
    If FW is not found on disk and nothing is preloaded users are
    faced with this unintuitive error:

    nfp 0000:04:00.0: nfp: Failed to find PF symbol _pf0_net_bar0

    We can do better. Since we rely on symbol table being present -
    check early if it could be correctly read out of from the device
    and if not print a more informative message.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Paolo Abeni says:

    ====================
    ipv6: udp: exploit dev_scratch helpers

    When bringing in the recent cache optimization for the UDP protocol, I forgot
    to leverage the newly introduced scratched area helpers in the UDPv6 code path.
    As a result, the UDPv6 implementation suffers some unnecessary performance
    penality when compared to v4.

    This series aim to bring back UDPv6 on equal footing in respect to v4.
    The first patch moves the shared helpers to the common include files, while
    the second uses them in the UDPv6 code.

    This gives 5-8% performance improvement for a system under flood with small
    UDPv6 packets. The performance delta is less than the one reported on the
    original patch set because the UDPv6 code path already leveraged some of the
    optimization.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The commit b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
    leveraged the scratched area helpers for UDP v4 but I forgot to
    update accordingly the IPv6 code path.

    This change extends the scratch area usage to the IPv6 code, synching
    the two implementations and giving some performance benefit.
    IPv6 is again almost on the same level of IPv4, performance-wide.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • So that they can be later used by the IPv6 code, too.
    Also lift the comments a bit.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • If icsk_ulp_ops is unset, it dereferences a null ptr.
    Add a null ptr check.

    BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 [inline]
    BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 net/ipv4/tcp.c:3057
    Read of size 4 at addr 0000000000000020 by task syz-executor1/15452

    Signed-off-by: Dave Watson
    Reported-by: "Levin, Alexander (Sasha Levin)"
    Signed-off-by: David S. Miller

    Dave Watson
     
  • The access to the wrong variable could lead to a NULL dereference and
    possibly other invalid memory reads in vxlan newlink/changelink requests
    with a IFLA_MTU attribute.

    Fixes: a985343ba906 "vxlan: refactor verification and application of configuration"
    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • It dates back from 2.1.16 and is obsolete since 2.1.68 when the current
    rule system has been introduced.

    Signed-off-by: Vincent Bernat
    Signed-off-by: David S. Miller

    Vincent Bernat
     

27 Jun, 2017

1 commit