18 Feb, 2016

1 commit


12 Feb, 2016

39 commits

  • Edward Cree says:

    ====================
    Local Checksum Offload

    Re-tested VxLAN; everything else is unchanged from v4.

    Changes from v4:
    * Rebased series to fix conflicts with vxlan/vxlan6 merge.

    Changes from v3:
    * Fixed inverted checksum values introduced in v3.
    * Don't mangle zero checksums in GRE.
    * Clear skb->encapsulation in iptunnel_handle_offloads when not using
    CHECKSUM_PARTIAL, lest drivers incorrectly interpret that as a request
    for inner checksum offload.

    Changes from v2:
    * Added support for IPv4 GRE.
    * Split out 'always set up for checksum offload' into its own patch.
    * Removed csum_help from iptunnel_handle_offloads.
    * Rewrote LCO callers to only fold once.
    * Simplified nocheck handling.

    Changes from v1:
    * Enabled support in more encapsulation protocols.
    I think it now covers everything except GRE.
    * Wrote up some documentation covering TX checksum offload, LCO and RCO.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • All users now pass false, so we can remove it, and remove the code that
    was conditional upon it.

    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • The only protocol affected at present is Geneve.

    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • If the dst device doesn't support it, it'll get fixed up later anyway
    by validate_xmit_skb(). Also, this allows us to take advantage of LCO
    to avoid summing the payload multiple times.

    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • The arithmetic properties of the ones-complement checksum mean that a
    correctly checksummed inner packet, including its checksum, has a ones
    complement sum depending only on whatever value was used to initialise
    the checksum field before checksumming (in the case of TCP and UDP,
    this is the ones complement sum of the pseudo header, complemented).
    Consequently, if we are going to offload the inner checksum with
    CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
    packed data not covered by the inner checksum, and the initial value of
    the inner checksum field.

    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller

    Edward Cree
     
  • Eric Dumazet says:

    ====================
    tcp/dccp: better use of ephemeral ports

    Big servers have bloated bind table, making very hard to succeed
    ephemeral port allocations, without special containers/namespace tricks.

    This patch series extends the strategy added in commit 07f4c90062f8
    ("tcp/dccp: try to not exhaust ip_local_port_range in connect()").

    Since ports used by connect() are much likely to be shared among them,
    we give a hint to both bind() and connect() to keep the crowds separated
    if possible.

    Of course, if on a specific host an application needs to allocate ~30000
    ports using bind(), it will still be able to do so. Same for ~30000 connect()
    to a unique 2-tuple (dst addr, dst port)

    New implemetation is also more friendly to softirqs and reschedules.

    v2: rebase after TCP SO_REUSEPORT changes
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Implement strategy used in __inet_hash_connect() in opposite way :

    Try to find a candidate using odd ports, then fallback to even ports.

    We no longer disable BH for whole traversal, but one bucket at a time.
    We also use cond_resched() to yield cpu to other tasks if needed.

    I removed one indentation level and tried to mirror the loop we have
    in __inet_hash_connect() and variable names to ease code maintenance.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range
    in connect()"), I added a very simple heuristic, so that we got better
    chances to use even ports, and allow bind() users to have more available
    slots.

    It gave nice results, but with more than 200,000 TCP sessions on a typical
    server, the ~30,000 ephemeral ports are still a rare resource.

    I chose to go a step further, by looking at all even ports, and if none
    was available, fallback to odd ports.

    The companion patch does the same in bind(), but in opposite way.

    I've seen exec times of up to 30ms on busy servers, so I no longer
    disable BH for the whole traversal, but only for each hash bucket.
    I also call cond_resched() to be gentle to other tasks.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Jesper Dangaard Brouer says:

    ====================
    net: mitigating kmem_cache free slowpath

    This patchset is the first real use-case for kmem_cache bulk _free_.
    The use of bulk _alloc_ is NOT included in this patchset. The full use
    have previously been posted here [1].

    The bulk free side have the largest benefit for the network stack
    use-case, because network stack is hitting the kmem_cache/SLUB
    slowpath when freeing SKBs, due to the amount of outstanding SKBs.
    This is solved by using the new API kmem_cache_free_bulk().

    Introduce new API napi_consume_skb(), that hides/handles bulk freeing
    for the caller. The drivers simply need to use this call when freeing
    SKBs in NAPI context, e.g. replacing their calles to dev_kfree_skb() /
    dev_consume_skb_any().

    Driver ixgbe is the first user of this new API.

    [1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There is an opportunity to bulk free SKBs during reclaiming of
    resources after DMA transmit completes in ixgbe_clean_tx_irq. Thus,
    bulk freeing at this point does not introduce any added latency.

    Simply use napi_consume_skb() which were recently introduced. The
    napi_budget parameter is needed by napi_consume_skb() to detect if it
    is called from netpoll.

    Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost)
    Single CPU/flow numbers: before: 1982144 pps -> after : 2064446 pps
    Improvement: +82302 pps, -20 nanosec, +4.1%
    (SLUB and GCC version 5.1.1 20150618 (Red Hat 5.1.1-4))

    Joint work with Alexander Duyck.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • The network stack defers SKBs free, in-case free happens in IRQ or
    when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
    writes SKBs that were free'ed during IRQ to the softirq completion
    queue (softnet_data.completion_queue).

    These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
    in function net_tx_action(). Take advantage of this a use the skb
    defer and flush API, as we are already in softirq context.

    For modern drivers this rarely happens. Although most drivers do call
    dev_kfree_skb_any(), which detects the situation and calls
    __dev_kfree_skb_irq() when needed. This due to netpoll can call from
    IRQ context.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Discovered that network stack were hitting the kmem_cache/SLUB
    slowpath when freeing SKBs. Doing bulk free with kmem_cache_free_bulk
    can speedup this slowpath.

    NAPI context is a bit special, lets take advantage of that for bulk
    free'ing SKBs.

    In NAPI context we are running in softirq, which gives us certain
    protection. A softirq can run on several CPUs at once. BUT the
    important part is a softirq will never preempt another softirq running
    on the same CPU. This gives us the opportunity to access per-cpu
    variables in softirq context.

    Extend napi_alloc_cache (before only contained page_frag_cache) to be
    a struct with a small array based stack for holding SKBs. Introduce a
    SKB defer and flush API for accessing this.

    Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
    when running in NAPI context. A small trick to handle/detect if we
    are called from netpoll is to see if budget is 0. In that case, we
    need to invoke dev_consume_skb_irq().

    Joint work with Alexander Duyck.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Nikolay Aleksandrov says:

    ====================
    virtio_net: better ethtool setting validation

    This small set is a follow-up for the recent patches that added ethtool
    get/set settings. Patch 1 changes the speed validation routine to check
    if the speed is between 0 and INT_MAX (or SPEED_UNKNOWN) and patch 2 adds
    port validation to virtio_net and better validation comment.

    This set is on top of Michael's patch which explains that speeds from 0
    to INT_MAX are valid:
    http://patchwork.ozlabs.org/patch/578911/
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We should validate the port setting that we got from the user and check
    if it's what we've set it to (PORT_OTHER), also add explanation that
    ignoring advertising is good as long as we don't have autonegotiation.

    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Devices these days can have any speed and as was recently pointed out
    any speed from 0 to INT_MAX is valid so adjust speed validation to
    accept such values.

    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Andrew F. Davis says:

    ====================
    net: phy: dp83848: Add support for TI TLK10x Ethernet PHYs

    This series is [0] split into its logical components.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Andrew F. Davis
    Signed-off-by: David S. Miller

    Andrew F. Davis
     
  • The TI TLK10x Ethernet PHYs are similar in the interrupt relevant
    registers and so are compatible with the DP83848x devices already
    supported.

    Signed-off-by: Andrew F. Davis
    Signed-off-by: David S. Miller

    Andrew F. Davis
     
  • Reorganize code by moving the desired interrupt mask definition
    out of function. Also rearrange the enable/disable interrupt function
    to prevent accidental over-writing of values in registers.

    Signed-off-by: Andrew F. Davis
    Signed-off-by: David S. Miller

    Andrew F. Davis
     
  • After acquiring National Semiconductor, TI appears to have
    changed the Vendor Model Number for the DP83848C PHYs,
    add this new ID to supported IDs.

    Signed-off-by: Andrew F. Davis
    Signed-off-by: David S. Miller

    Andrew F. Davis
     
  • Add a helper macro for defining dp83848 compatible phy devices.
    Update copyright info.

    Signed-off-by: Andrew F. Davis
    Signed-off-by: David S. Miller

    Andrew F. Davis
     
  • The EVB (virtual bridge) functionality should be disabled on older BE3
    and Lancer chips if SR-IOV is disabled in the NIC's BIOS. This setting
    is identified by the zero value of total VFs reported by the card.
    The GET_HSW_CONFIG command cannot be used as it is not supported by
    these older chipset's FW.

    v2: added the comment

    Cc: Sathya Perla
    Cc: Ajit Khaparde
    Cc: Padmanabh Ratnakar
    Cc: Sriharsha Basavapatna
    Cc: Somnath Kotur
    Signed-off-by: Ivan Vecera
    Acked-by: Sathya Perla
    Signed-off-by: David S. Miller

    Ivan Vecera
     
  • Helmut Buchsbaum says:

    ====================
    Add support for MICREL KSZ8795CLX 5-port switch

    This patch series refactors the spi-ks8995 driver to finally add support
    for the MICREL KSZ8795CLX. Additionally support for controlling a GPIO
    line for resetting the switch is added.

    Helmut

    Changes since v2:
    - use GPIO_ACTIVE_LOW according to Andrew's remark.
    - use ePAPR compliant node name in example, thanks to Sergei for
    pointing out
    Changes since v1:
    - removed initializing registers from Device Tree following Florian's
    advice
    - fixed GPIO handling for reset according to Andrew's remark.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Helmut Buchsbaum
    Signed-off-by: David S. Miller

    Helmut Buchsbaum
     
  • Add support for MICREL KSZ8795CLX Integrated 5-Port, 10-/100-Managed
    Ethernet Switch with Gigabit GMII/RGMII and MII/RMII interfaces.

    Signed-off-by: Helmut Buchsbaum
    Signed-off-by: David S. Miller

    Helmut Buchsbaum
     
  • Prepare creating SPI reads and writes for other switch families.
    The KS8995 family uses the straight forward

    sequence.
    To be able to support KSZ8795 family, which uses

    make the SPI command creation chip variant dependent.

    Signed-off-by: Helmut Buchsbaum
    Signed-off-by: David S. Miller

    Helmut Buchsbaum
     
  • When using device tree it is no more possible to reset the PHY at board
    level. Furthermore, doing in the driver allows to power down the switch
    when it is not used any more.

    The patch introduces a new optional property "reset-gpios" denoting an
    appropriate GPIO handle, e.g.:

    reset-gpios =

    Signed-off-by: Helmut Buchsbaum
    Signed-off-by: David S. Miller

    Helmut Buchsbaum
     
  • Since the chip variant is now determined by spi_device_id, verify
    family and chip id and determine the revision id.

    Signed-off-by: Helmut Buchsbaum
    Signed-off-by: David S. Miller

    Helmut Buchsbaum
     
  • Refactor to use spi_device_id table to facilitate easy
    extendability.

    Signed-off-by: Helmut Buchsbaum
    Signed-off-by: David S. Miller

    Helmut Buchsbaum
     
  • Sunil Goutham says:

    ====================
    net: thunderx: Setting IRQ affinity hints and other optimizations

    This patch series contains changes
    - To add support for virtual function's irq affinity hint
    - Replace napi_schedule() with napi_schedule_irqoff()
    - Reduce page allocation overhead by allocating pages
    of higher order when pagesize is 4KB.
    - Add couple of stats which helps in debugging
    - Some miscellaneous changes to BGX driver.

    Changes from v1:
    - As suggested changed MAC address invalid log message
    to dev_err() instead of dev_warn().
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Allocate higher order pages when pagesize is small, this will
    reduce number of calls to page allocator and wastage of memory.

    Signed-off-by: Sunil Goutham
    Signed-off-by: David S. Miller

    Sunil Goutham
     
  • Signed-off-by: Robert Richter
    Signed-off-by: Sunil Goutham
    Signed-off-by: David S. Miller

    Robert Richter
     
  • In the case of OF device tree, the firmware information is attached to
    the BGX device structure in the standard manner, so use the firmware
    iterators and accessors where possible.

    Signed-off-by: David Daney
    Signed-off-by: Sunil Goutham
    Signed-off-by: David S. Miller

    David Daney
     
  • This affinity hint can be used by user space irqbalance tool to set
    preferred CPU mask for irqs registered by this VF. Irqbalance needs
    to be in 'exact' mode to set irq affinity same as indicated by
    affinity hint.

    Signed-off-by: Sunil Goutham
    Signed-off-by: David S. Miller

    Sunil Goutham
     
  • napi_schedule is being called from hard irq context, hence
    switch to napi_schedule_irqoff which avoids unneeded call
    to local_irq_save and local_irq_restore.

    Signed-off-by: Sunil Goutham
    Signed-off-by: David S. Miller

    Sunil Goutham