25 Jun, 2017

1 commit

  • Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
    representators and control messages over single set of hardware queues.
    Control messages and muxed traffic may need ordered delivery.

    Those requirements make it hard to comfortably use TC infrastructure today
    unless we have a way of attaching metadata to skbs at the upper device.
    Because single set of queues is used for many netdevs stopping TC/sched
    queues of all of them reliably is impossible and lower device has to
    retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
    the fastpath.

    This patch attempts to enable port/representative devs to attach metadata
    to skbs which carry port id. This way representatives can be queueless and
    all queuing can be performed at the lower netdev in the usual way.

    Traffic arriving on the port/representative interfaces will be have
    metadata attached and will subsequently be queued to the lower device for
    transmission. The lower device should recognize the metadata and translate
    it to HW specific format which is most likely either a special header
    inserted before the network headers or descriptor/metadata fields.

    Metadata is associated with the lower device by storing the netdev pointer
    along with port id so that if TC decides to redirect or mirror the new
    netdev will not try to interpret it.

    This is mostly for SR-IOV devices since switches don't have lower netdevs
    today.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

24 Jun, 2017

34 commits

  • Florian Fainelli says:

    ====================
    net: phy: Support "internal" PHY interface

    This makes the "internal" phy-mode property generally available and
    documented and this allows us to remove some custom parsing code
    we had for bcmgenet and bcm_sf2 which both used that specific value.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The PHY library now supports an "internal" phy-mode, thus making our
    custom parsing code now unnecessary.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The PHY library now supports an "internal" phy-mode, thus making our
    custom parsing code now unnecessary.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Now that the Device Tree binding has been updated, update the PHY
    library phy_interface_t and phy_modes to support the "internal" PHY
    interface type.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • A number of Ethernet MACs have internal Ethernet PHYs and the internal
    wiring makes it so that this knowledge needs to be available using the
    standard 'phy-mode' property.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2017-06-23

    This series provides some updates to the mlx5 core and netdevice drivers.

    Three patches from Tariq, Introduces page reuse mechanism in non-Striding
    RQ RX datapath, we allow the the RX descriptor to reuse its allocated page
    as much as it could, until the page is fully consumed. RX page reuse
    reduces the stress on page allocator and improves RX performance especially
    with high speeds (100Gb/s).

    Next four patches of the series from Or allows to offload tc flower matching
    on ttl/hoplimit and header re-write of hoplimit.

    The rest of the series from Yotam and Or enhances mlx5 to support FW flashing
    through the mlxfw module, in a similar manner done by the mlxsw driver.
    Currently, only ethtool based flashing is implemented, where both Eth and IB ports
    are supported.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Buffer group mappings can be obtained using FW_PARAMs cmd for newer FW.

    Since some of the bg_maps are obtained in atomic context, created another
    t4_query_params_ns(), that wont sleep when awaiting mbox cmd completion.

    Signed-off-by: Casey Leedom
    Signed-off-by: Arjun Vynipadath
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller

    Arjun Vynipadath
     
  • We were using t4_get_mps_bg_map() for both t4_get_port_stats()
    to determine which MPS Buffer Groups to report statistics on for a given
    Port, and also for t4_sge_alloc_rxq() to provide a TP Ingress Channel
    Congestion Map. For T4/T5 these are actually the same values (because they
    are ~somewhat~ related), but for T6 they should return different values
    (T6 has Port 0 associated with MPS Buffer Group 0 (with MPS Buffer Group 1
    silently cascading off) and Port 1 is associated with MPS Buffer Group 2
    (with 3 cascading off)).

    Based on the original work by Casey Leedom
    Signed-off-by: Arjun Vynipadath
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller

    Arjun Vynipadath
     
  • The copy_to_user() function returns the number of bytes remaining but we
    want to return -EFAULT here.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Signed-off-by: Dan Carpenter
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-06-23

    1) Use memdup_user to spmlify xfrm_user_policy.
    From Geliang Tang.

    2) Make xfrm_dev_register static to silence a sparse warning.
    From Wei Yongjun.

    3) Use crypto_memneq to check the ICV in the AH protocol.
    From Sabrina Dubroca.

    4) Remove some unused variables in esp6.
    From Stephen Hemminger.

    5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
    From Antony Antony.

    6) Include the UDP encapsulation port to km_migrate announcements.
    From Antony Antony.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Netanel Belgazal says:

    ====================
    net: update ena ethernet driver to version 1.2.0

    This patchset contains some new features/improvements that were added
    to the ENA driver to increase its robustness and are based on
    experience of wide ENA deployment.

    Change log:

    V2:
    * Remove patch that add inline to C-file static function (contradict coding style).
    * Remove patch that moves MTU parameter validation in ena_change_mtu() instead of
    using the network stack.
    * Use upper_32_bits()/lower_32_bits() instead of casting.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • rx drop counter is reported by the device in the keep-alive
    event.
    update the driver's counter with the device counter.

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • In ena_com_mem_addr_set(), use the above functions to split dma address
    to the lower 32 bits and the higher 16 bits.

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • Current driver tries to allocate msix vectors as the number of the
    negotiated io queues. (with another msix vector for management).
    If pci_alloc_irq_vectors() fails, the driver aborts the probe
    and the ENA network device is never brought up.

    With this patch, the driver's logic will reduce the number of IO
    queues to the number of allocated msix vectors (minus one for management)
    instead of failing probe().

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • ENA driver post Rx buffers through the Rx submission queue
    for the ENA device to fill them with receive packets.
    Each Rx buffer is marked with req_id in the Rx descriptor.

    Newer ENA devices could consume the posted Rx buffer in out of order,
    and as result the corresponding Rx completion queue will have Rx
    completion descriptors with non contiguous req_id(s)

    In this change the driver holds two rings.
    The first ring (called free_rx_ids) is a mapping ring.
    It holds all the unused request ids.
    The values in this ring are from 0 to ring_size -1.

    When the driver wants to allocate a new Rx buffer it uses the head of
    free_rx_ids and uses it's value as the index for rx_buffer_info ring.
    The req_id is also written to the Rx descriptor

    Upon Rx completion,
    The driver took the req_id from the completion descriptor and uses it
    as index in rx_buffer_info.
    The req_id is then return to the free_rx_ids ring.

    This patch also adds statistics to inform when the driver receive out
    of range or unused req_id.

    Note:
    free_rx_ids is only accessible from the napi handler, so no locking is
    required

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • For each device reset, log to the device what is the cause
    the reset occur.

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • Instead of using:
    memset(ptr, 0x0, sizeof(struct ...))
    use:
    memset(ptr, 0x0, sizeor(*ptr))

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • With this patch, ENA device can update the ena driver about
    the desired timeout values:
    These values are part of the "hardware hints" which are transmitted
    to the driver as Asynchronous event through ENA async
    event notification queue.

    In case the ENA device does not support this capability,
    the driver will use its own default values.

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • return -EOPNOTSUPP instead of -EPERM.

    Signed-off-by: Netanel Belgazal
    Signed-off-by: David S. Miller

    Netanel Belgazal
     
  • KASAN reports out-of-bound access in proc_dostring() coming from
    proc_tcp_available_ulp() because in case TCP ULP list is empty
    the buffer allocated for the response will not have anything
    printed into it. Set the first byte to zero to avoid strlen()
    going out-of-bounds.

    Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Commit 31fd85816dbe ("bpf: permits narrower load from bpf program
    context fields") permits narrower load for certain ctx fields.
    The commit however will already generate a masking even if
    the prog-specific ctx conversion produces the result with
    narrower size.

    For example, for __sk_buff->protocol, the ctx conversion
    loads the data into register with 2-byte load.
    A narrower 2-byte load should not generate masking.
    For __sk_buff->vlan_present, the conversion function
    set the result as either 0 or 1, essentially a byte.
    The narrower 2-byte or 1-byte load should not generate masking.

    To avoid unnecessary masking, prog-specific *_is_valid_access
    now passes converted_op_size back to verifier, which indicates
    the valid data width after perceived future conversion.
    Based on this information, verifier is able to avoid
    unnecessary marking.

    Since we want more information back from prog-specific
    *_is_valid_access checking, all of them are packed into
    one data structure for more clarity.

    Acked-by: Daniel Borkmann
    Signed-off-by: Yonghong Song
    Signed-off-by: David S. Miller

    Yonghong Song
     
  • The functions dwmac4_dma_init_rx_chan, dwmac4_dma_init_tx_chan and
    dwmac4_dma_init_channel do not need to be in global scope, so them
    static.

    Cleans up sparse warnings:
    "symbol 'dwmac4_dma_init_rx_chan' was not declared. Should it be static?"
    "symbol 'dwmac4_dma_init_tx_chan' was not declared. Should it be static?"
    "symbol 'dwmac4_dma_init_channel' was not declared. Should it be static?"

    Signed-off-by: Colin Ian King
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • Jakub Kicinski says:

    ====================
    xdp: offload mode

    While we discuss the representors.. :)

    This set adds XDP flag for forcing offload and a attachment mode
    for reporting to user space that program has been offloaded. The
    nfp driver is modified to make use of the new flags, but also to
    adhere to the DRV_MODE flag which should disable the HW offload.

    The intended driver behaviour is:
    DRV mode offload
    no flags yes attempted
    DRV_MODE yes no
    HW_MODE no yes

    Where 'yes' means required, and error will be returned if setup fails.
    'Attempted' means the offload will only happen automatically if HW is
    capable and offloading the program will cause no change in system
    behaviour (e.g. maps don't have to bound).

    Thanks to loading the program both to the driver and HW by default we
    can fallback to the driver mode without disruption in case user replaces
    the program with one which cannot be offloaded later.

    Note that the NFP driver currently claims XDP offload support but
    lacks most basic features like direct packet access.

    Only change compared to the RFC is fixing the double bpf_prog_put()
    which Daniel has spotted (patch 5).
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Make use of just added XDP_ATTACHED_HW.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Extend the XDP_ATTACHED_* values to include offloaded mode.
    Let drivers report whether program is installed in the driver
    or the HW by changing the prog_attached field from bool to
    u8 (type of the netlink attribute).

    Exploit the fact that the value of XDP_ATTACHED_DRV is 1,
    therefore since all drivers currently assign the mode with
    double negation:
    mode = !!xdp_prog;
    no drivers have to be modified.

    Signed-off-by: Jakub Kicinski
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Respect the XDP_FLAGS_HW_MODE. When it's set install the program
    on the NIC and skip enabling XDP in the driver.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • The xdp_prog member of the adapter's data path structure is used
    for XDP in driver mode. In case a XDP program is loaded with in
    HW-only mode, we need to store it somewhere else. Add a new XDP
    prog pointer in the main structure and use that when we need to
    know whether any XDP program is loaded, not only a driver mode
    one. Only release our reference on adapter free instead of
    immediately after netdev unregister to allow offload to be disabled
    first.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • DRV_MODE means that user space wants the program to be run in
    the driver. Do not try to offload. Only offload if no mode
    flags have been specified.

    Remember what the mode is when the program is installed and refuse
    new setup requests if there is already a program loaded in a
    different mode. This should leave it open for us to implement
    simultaneous loading of two programs - one in the drv path and
    another to the NIC later.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • In preparation of XDP offload flags move the driver setup into
    a function. Otherwise the number of conditions in one function
    would make it slightly hard to follow. The offload handler may
    now be called with NULL prog, even if no offload is currently
    active, but that's fine, offload code can handle that.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Add an installation-time flag for requesting that the program
    be installed only if it can be offloaded to HW.

    Internally new command for ndo_xdp is added, this way we avoid
    putting checks into drivers since they all return -EINVAL on
    an unknown command.

    Signed-off-by: Jakub Kicinski
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Pass XDP flags to the xdp ndo. This will allow drivers to look
    at the mode flags and make decisions about offload.

    Signed-off-by: Jakub Kicinski
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

23 Jun, 2017

5 commits

  • Michael reported an UDP breakage caused by the commit b65ac44674dd
    ("udp: try to avoid 2 cache miss on dequeue").
    The function __first_packet_length() can update the checksum bits
    of the pending skb, making the scratched area out-of-sync, and
    setting skb->csum, if the skb was previously in need of checksum
    validation.

    On later recvmsg() for such skb, checksum validation will be
    invoked again - due to the wrong udp_skb_csum_unnecessary()
    value - and will fail, causing the valid skb to be dropped.

    This change addresses the issue refreshing the scratch area in
    __first_packet_length() after the possible checksum update.

    Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
    Reported-by: Michael Ellerman
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • very similar to commit dd99e425be23 ("udp: prefetch
    rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache
    miss when the BH is bottle-neck for UDP over ipv6 packet
    processing, e.g. for small packets when a single RX NIC ingress
    queue is in use.

    Performances under flood when multiple NIC RX queues used are
    unaffected, but when a single NIC rx queue is in use, this
    gives ~8% performance improvement.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Thomas Petazzoni says:

    ====================
    net: mvpp2: misc improvements

    Here are a few patches making various small improvements/refactoring
    in the mvpp2 driver. They are based on today's net-next.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When all a function does is calling another function with the exact same
    arguments, in the exact same order, you know it's time to remove said
    function. Which is exactly what this commit does.

    Signed-off-by: Thomas Petazzoni
    Signed-off-by: David S. Miller

    Thomas Petazzoni
     
  • This function is not used in the driver, remove it.

    Signed-off-by: Thomas Petazzoni
    Signed-off-by: David S. Miller

    Thomas Petazzoni