22 Nov, 2014

1 commit

  • Not sure what I was thinking, but doing anything after
    releasing a refcount is suicidal or/and embarrassing.

    By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
    could have released last reference and freed whole skb.

    We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.

    Reported-by: Chris Mason
    Fixes: ce1a4ea3f1258 ("net: avoid one atomic operation in skb_clone()")
    Signed-off-by: Eric Dumazet
    Cc: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Nov, 2014

1 commit

  • If a driver supports reading EEPROM but no EEPROM is installed in the system,
    the driver's get_eeprom_len function returns 0. ethtool will subsequently
    try to read that zero-length EEPROM anyway. If the driver does not support
    EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
    does support EEPROM access but no EEPROM is installed, the operation will
    return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.

    Signed-off-by: Guenter Roeck
    Tested-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Guenter Roeck
     

27 Oct, 2014

1 commit


23 Oct, 2014

1 commit

  • The crafted header start address is from a driver supplied buffer, which
    one can reasonably expect to be aligned on a 4-bytes boundary.
    However ATM the TSO helper API is only used by ethernet drivers and
    the tcp header will then be aligned to a 2-bytes only boundary from the
    header start address.

    Signed-off-by: Karl Beldan
    Cc: Ezequiel Garcia
    Signed-off-by: David S. Miller

    Karl Beldan
     

21 Oct, 2014

1 commit


19 Oct, 2014

1 commit

  • Pull networking fixes from David Miller:

    1) Include fixes for netrom and dsa (Fabian Frederick and Florian
    Fainelli)

    2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

    3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
    ip_tunnel, fou), from Li ROngQing.

    4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

    5) Use after free in virtio_net, from Michael S Tsirkin.

    6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
    Shelar.

    7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

    8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

    9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

    10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
    Wang and Eric Dumazet.

    11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
    chunks, from Daniel Borkmann.

    12) Don't call sock_kfree_s() with NULL pointers, this function also has
    the side effect of adjusting the socket memory usage. From Cong Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
    bna: fix skb->truesize underestimation
    net: dsa: add includes for ethtool and phy_fixed definitions
    openvswitch: Set flow-key members.
    netrom: use linux/uaccess.h
    dsa: Fix conversion from host device to mii bus
    tipc: fix bug in bundled buffer reception
    ipv6: introduce tcp_v6_iif()
    sfc: add support for skb->xmit_more
    r8152: return -EBUSY for runtime suspend
    ipv4: fix a potential use after free in fou.c
    ipv4: fix a potential use after free in ip_tunnel_core.c
    hyperv: Add handling of IP header with option field in netvsc_set_hash()
    openvswitch: Create right mask with disabled megaflows
    vxlan: fix a free after use
    openvswitch: fix a use after free
    ipv4: dst_entry leak in ip_send_unicast_reply()
    ipv4: clean up cookie_v4_check()
    ipv4: share tcp_v4_save_options() with cookie_v4_check()
    ipv4: call __ip_options_echo() in cookie_v4_check()
    atm: simplify lanai.c by using module_pci_driver
    ...

    Linus Torvalds
     

16 Oct, 2014

1 commit

  • Add ndo_gso_check which a device can define to indicate whether is
    is capable of doing GSO on a packet. This funciton would be called from
    the stack to determine whether software GSO is needed to be done. A
    driver should populate this function if it advertises GSO types for
    which there are combinations that it wouldn't be able to handle. For
    instance a device that performs UDP tunneling might only implement
    support for transparent Ethernet bridging type of inner packets
    or might have limitations on lengths of inner headers.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

15 Oct, 2014

2 commits

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     
  • Unlike normal kfree() it is never right to call sock_kfree_s() with
    a NULL pointer, because sock_kfree_s() also has the side effect of
    discharging the memory from the sockets quota.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Oct, 2014

3 commits

  • This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
    the page. Other entities in the kernel need to use get_page_unless_zero()
    to get a reference to the page before testing page properties, so we could
    loose a refcount increment.

    The only case it is valid is when page->_count is 0

    Fixes: 540eb7bf0bbed ("net: Update alloc frag to reduce get/put page usage and recycle pages")
    Signed-off-by: Eric Dumaze
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch addresses a kernel unaligned access bug seen on a sparc64 system
    with an igb adapter. Specifically the __skb_flow_get_ports was returning a
    be32 pointer which was then having the value directly returned.

    In order to prevent this it is actually easier to simply not populate the
    ports or address values when an skb is not present. In this case the
    assumption is that the data isn't needed and rather than slow down the
    faster aligned accesses by making them have to assume the unaligned path on
    architectures that don't support efficent unaligned access it makes more
    sense to simply switch off the bits that were copying the source and
    destination address/port for the case where we only care about the protocol
    types and lengths which are normally 16 bit fields anyway.

    Reported-by: David S. Miller
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • 1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER.
    2. Remove wrong comments about storing intermediate value.
    3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's
    comments

    Cc: Alexei Starovoitov
    Signed-off-by: Li RongQing
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Li RongQing
     

10 Oct, 2014

1 commit

  • This patch fix following warning.
    Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len'
    Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len'
    Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order'
    Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode'
    Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask'

    Acutually the descriptions exist, but missing "@" in front.

    This problem start to happen when following commit was merged
    into Linus's tree during 3.18-rc1 merge period.
    commit 2e4e44107176d552f8bb1bb76053e850e3809841
    net: add alloc_skb_with_frags() helper

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     

09 Oct, 2014

1 commit

  • Pull networking updates from David Miller:
    "Most notable changes in here:

    1) By far the biggest accomplishment, thanks to a large range of
    contributors, is the addition of multi-send for transmit. This is
    the result of discussions back in Chicago, and the hard work of
    several individuals.

    Now, when the ->ndo_start_xmit() method of a driver sees
    skb->xmit_more as true, it can choose to defer the doorbell
    telling the driver to start processing the new TX queue entires.

    skb->xmit_more means that the generic networking is guaranteed to
    call the driver immediately with another SKB to send.

    There is logic added to the qdisc layer to dequeue multiple
    packets at a time, and the handling mis-predicted offloads in
    software is now done with no locks held.

    Finally, pktgen is extended to have a "burst" parameter that can
    be used to test a multi-send implementation.

    Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
    virtio_net

    Adding support is almost trivial, so export more drivers to
    support this optimization soon.

    I want to thank, in no particular or implied order, Jesper
    Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
    Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
    David Tat, Hannes Frederic Sowa, and Rusty Russell.

    2) PTP and timestamping support in bnx2x, from Michal Kalderon.

    3) Allow adjusting the rx_copybreak threshold for a driver via
    ethtool, and add rx_copybreak support to enic driver. From
    Govindarajulu Varadarajan.

    4) Significant enhancements to the generic PHY layer and the bcm7xxx
    driver in particular (EEE support, auto power down, etc.) from
    Florian Fainelli.

    5) Allow raw buffers to be used for flow dissection, allowing drivers
    to determine the optimal "linear pull" size for devices that DMA
    into pools of pages. The objective is to get exactly the
    necessary amount of headers into the linear SKB area pre-pulled,
    but no more. The new interface drivers use is eth_get_headlen().
    From WANG Cong, with driver conversions (several had their own
    by-hand duplicated implementations) by Alexander Duyck and Eric
    Dumazet.

    6) Support checksumming more smoothly and efficiently for
    encapsulations, and add "foo over UDP" facility. From Tom
    Herbert.

    7) Add Broadcom SF2 switch driver to DSA layer, from Florian
    Fainelli.

    8) eBPF now can load programs via a system call and has an extensive
    testsuite. Alexei Starovoitov and Daniel Borkmann.

    9) Major overhaul of the packet scheduler to use RCU in several major
    areas such as the classifiers and rate estimators. From John
    Fastabend.

    10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
    Duyck.

    11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
    Dumazet.

    12) Add Datacenter TCP congestion control algorithm support, From
    Florian Westphal.

    13) Reorganize sk_buff so that __copy_skb_header() is significantly
    faster. From Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
    netlabel: directly return netlbl_unlabel_genl_init()
    net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
    net: description of dma_cookie cause make xmldocs warning
    cxgb4: clean up a type issue
    cxgb4: potential shift wrapping bug
    i40e: skb->xmit_more support
    net: fs_enet: Add NAPI TX
    net: fs_enet: Remove non NAPI RX
    r8169:add support for RTL8168EP
    net_sched: copy exts->type in tcf_exts_change()
    wimax: convert printk to pr_foo()
    af_unix: remove 0 assignment on static
    ipv6: Do not warn for informational ICMP messages, regardless of type.
    Update Intel Ethernet Driver maintainers list
    bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
    tipc: fix bug in multicast congestion handling
    net: better IFF_XMIT_DST_RELEASE support
    net/mlx4_en: remove NETDEV_TX_BUSY
    3c59x: fix bad split of cpu_to_le32(pci_map_single())
    net: bcmgenet: fix Tx ring priority programming
    ...

    Linus Torvalds
     

08 Oct, 2014

2 commits

  • Pull dmaengine updates from Dan Williams:
    "Even though this has fixes marked for -stable, given the size and the
    needed conflict resolutions this is 3.18-rc1/merge-window material.

    These patches have been languishing in my tree for a long while. The
    fact that I do not have the time to do proper/prompt maintenance of
    this tree is a primary factor in the decision to step down as
    dmaengine maintainer. That and the fact that the bulk of drivers/dma/
    activity is going through Vinod these days.

    The net_dma removal has not been in -next. It has developed simple
    conflicts against mainline and net-next (for-3.18).

    Continuing thanks to Vinod for staying on top of drivers/dma/.

    Summary:

    1/ Step down as dmaengine maintainer see commit 08223d80df38
    "dmaengine maintainer update"

    2/ Removal of net_dma, as it has been marked 'broken' since 3.13
    (commit 77873803363c "net_dma: mark broken"), without reports of
    performance regression.

    3/ Miscellaneous fixes"

    * tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
    net: make tcp_cleanup_rbuf private
    net_dma: revert 'copied_early'
    net_dma: simple removal
    dmaengine maintainer update
    dmatest: prevent memory leakage on error path in thread
    ioat: Use time_before_jiffies()
    dmaengine: fix xor sources continuation
    dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
    dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
    dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
    ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
    drivers: dma: Include appropriate header file in dca.c
    drivers: dma: Mark functions as static in dma_v3.c
    dma: mv_xor: Add DMA API error checks
    ioat/dca: Use dev_is_pci() to check whether it is pci device

    Linus Torvalds
     
  • Testing xmit_more support with netperf and connected UDP sockets,
    I found strange dst refcount false sharing.

    Current handling of IFF_XMIT_DST_RELEASE is not optimal.

    Dropping dst in validate_xmit_skb() is certainly too late in case
    packet was queued by cpu X but dequeued by cpu Y

    The logical point to take care of drop/force is in __dev_queue_xmit()
    before even taking qdisc lock.

    As Julian Anastasov pointed out, need for skb_dst() might come from some
    packet schedulers or classifiers.

    This patch adds new helper to cleanly express needs of various drivers
    or qdiscs/classifiers.

    Drivers that need skb_dst() in their ndo_start_xmit() should call
    following helper in their setup instead of the prior :

    dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
    ->
    netif_keep_dst(dev);

    Instead of using a single bit, we use two bits, one being
    eventually rebuilt in bonding/team drivers.

    The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
    rebuilt in bonding/team. Eventually, we could add something
    smarter later.

    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Oct, 2014

3 commits


06 Oct, 2014

3 commits

  • Use new ethtool [sg]et_tunable() to set tx_copybread (inline threshold)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Its unfortunate we have to walk again skb list to find the tail
    after segmentation, even if data is probably hot in cpu caches.

    skb_segment() can store the tail of the list into segs->prev,
    and validate_xmit_skb_list() can immediately get the tail.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains another batch with Netfilter/IPVS updates
    for net-next, they are:

    1) Add abstracted ICMP codes to the nf_tables reject expression. We
    introduce four reasons to reject using ICMP that overlap in IPv4
    and IPv6 from the semantic point of view. This should simplify the
    maintainance of dual stack rule-sets through the inet table.

    2) Move nf_send_reset() functions from header files to per-family
    nf_reject modules, suggested by Patrick McHardy.

    3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
    code now that br_netfilter can be modularized. Convert remaining spots
    in the network stack code.

    4) Use rcu_barrier() in the nf_tables module removal path to ensure that
    we don't leave object that are still pending to be released via
    call_rcu (that may likely result in a crash).

    5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
    idea was to probe the word size based on the xtables match/target info
    size, but this assumption is wrong when you have to dump the information
    back to userspace.

    6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
    In order to emulate the ebtables NAT chains (which are actually simple
    filter chains with no special semantics), we have support filtering from
    this hooks too.

    7) Add explicit module dependency between xt_physdev and br_netfilter.
    This provides a way to detect if the user needs br_netfilter from
    the configuration path. This should reduce the breakage of the
    br_netfilter modularization.

    8) Cleanup coding style in ip_vs.h, from Simon Horman.

    9) Fix crash in the recently added nf_tables masq expression. We have
    to register/unregister the notifiers to clean up the conntrack table
    entries from the module init/exit path, not from the rule addition /
    deletion path. From Arturo Borrero.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Oct, 2014

1 commit

  • SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
    1: If skb is allocated from head_cache, it indicates fclone is not available.
    2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
    it is available to be used.

    To avoid confusion for case 2 above, this patch replaces
    SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
    companion skbs, this indicates it is free for use.

    SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
    cannot / will not have a companion fclone.

    Signed-off-by: Vijay Subramanian
    Signed-off-by: David S. Miller

    Vijay Subramanian
     

04 Oct, 2014

2 commits

  • skb_gro_receive() is only called from tcp_gro_receive() which is
    not in a module.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Validation of skb can be pretty expensive :

    GSO segmentation and/or checksum computations.

    We can do this without holding qdisc lock, so that other cpus
    can queue additional packets.

    Trick is that requeued packets were already validated, so we carry
    a boolean so that sch_direct_xmit() can validate a fresh skb list,
    or directly use an old one.

    Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
    host.

    Turning TSO on or off had no effect on throughput, only few more cpu
    cycles. Lock contention on qdisc lock disappeared.

    Same if disabling TX checksum offload.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Oct, 2014

2 commits


02 Oct, 2014

3 commits

  • This patch demonstrates the effect of delaying update of HW tailptr.
    (based on earlier patch by Jesper)

    burst=1 is the default. It sends one packet with xmit_more=false
    burst=2 sends one packet with xmit_more=true and
    2nd copy of the same packet with xmit_more=false
    burst=3 sends two copies of the same packet with xmit_more=true and
    3rd copy with xmit_more=false

    Performance with ixgbe (usec 30):
    burst=1 tx:9.2 Mpps
    burst=2 tx:13.5 Mpps
    burst=3 tx:14.5 Mpps full 10G line rate

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Fast clone cloning can actually avoid an atomic_inc(), if we
    guarantee prior clone_ref value is 1.

    This requires a change kfree_skbmem(), to perform the
    atomic_dec_and_test() on clone_ref before setting fclone to
    SKB_FCLONE_UNAVAILABLE.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Lets use a proper structure to clearly document and implement
    skb fast clones.

    Then, we might experiment more easily alternative layouts.

    This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
    to stop leaking of implementation details.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Sep, 2014

5 commits

  • After previous patches to simplify qstats the qstats can be
    made per cpu with a packed union in Qdisc struct.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This removes the use of qstats->qlen variable from the classifiers
    and makes it an explicit argument to gnet_stats_copy_queue().

    The qlen represents the qdisc queue length and is packed into
    the qstats at the last moment before passnig to user space. By
    handling it explicitely we avoid, in the percpu stats case, having
    to figure out which per_cpu variable to put it in.

    It would probably be best to remove it from qstats completely
    but qstats is a user space ABI and can't be broken. A future
    patch could make an internal only qstats structure that would
    avoid having to allocate an additional u32 variable on the
    Qdisc struct. This would make the qstats struct 128bits instead
    of 128+32.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • In order to run qdisc's without locking statistics and estimators
    need to be handled correctly.

    To resolve bstats make the statistics per cpu. And because this is
    only needed for qdiscs that are running without locks which is not
    the case for most qdiscs in the near future only create percpu
    stats when qdiscs set the TCQ_F_CPUSTATS flag.

    Next because estimators use the bstats to calculate packets per
    second and bytes per second the estimator code paths are updated
    to use the per cpu statistics.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • In commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
    I added a regression for linear skb that traditionally force GRO
    to use the frag_list fallback.

    Erez Shitrit found that at most two segments were aggregated and
    the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.

    This is because pinfo at this spot still points to the last skb in the
    chain, instead of the first one, where we find the correct gso_size
    information.

    Signed-off-by: Eric Dumazet
    Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb")
    Reported-by: Erez Shitrit
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • With proliferation of bit fields in sk_buff, __copy_skb_header() became
    quite expensive, showing as the most expensive function in a GSO
    workload.

    __copy_skb_header() performance is also critical for non GSO TCP
    operations, as it is used from skb_clone()

    This patch carefully moves all the fields that were not copied in a
    separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more

    Then I moved all other fields and all other copied fields in a section
    delimited by headers_start[0]/headers_end[0] section so that we
    can use a single memcpy() call, inlined by compiler using long
    word load/stores.

    I also tried to make all copies in the natural orders of sk_buff,
    to help hardware prefetching.

    I made sure sk_buff size did not change.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Sep, 2014

1 commit

  • Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
    and there is no plan to fix it.

    This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
    Reverting the remainder of the net_dma induced changes is deferred to
    subsequent patches.

    Marked for stable due to Roman's report of a memory leak in
    dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

    Cc: Dave Jiang
    Cc: Vinod Koul
    Cc: David Whipple
    Cc: Alexander Duyck
    Cc:
    Reported-by: Roman Gushchin
    Acked-by: David S. Miller
    Signed-off-by: Dan Williams

    Dan Williams
     

27 Sep, 2014

4 commits

  • Cache skb_shinfo(skb) in a variable to avoid computing it multiple
    times.

    Reorganize the tests to remove one indentation level.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • csum_partial() is a generic function which is not optimised for small fixed
    length calculations, and its use requires to store "from" and "to" values in
    memory while we already have them available in registers. This also has impact,
    especially on RISC processors. In the same spirit as the change done by
    Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
    taking into account RFC1624.

    I spotted during a NATted tcp transfert that csum_partial() is one of top 5
    consuming functions (around 8%), and the second user of csum_partial() is
    inet_proto_csum_replace4().

    Signed-off-by: Christophe Leroy
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    LEROY Christophe
     
  • While profiling TCP stack, I noticed one useless atomic operation
    in tcp_sendmsg(), caused by skb_header_release().

    It turns out all current skb_header_release() users have a fresh skb,
    that no other user can see, so we can avoid one atomic operation.

    Introduce __skb_header_release() to clearly document this.

    This gave me a 1.5 % improvement on TCP_RR workload.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • No caller or macro uses the return value so make all
    the functions return void.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches