07 Oct, 2012
1 commit
-
Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.With help from Maxime Bizon, who pointed out that commit
87151b8689d (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.Instead of fixing this bug, lets remove skb recycling.
Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()Signed-off-by: Eric Dumazet
Cc: Maxime Bizon
Signed-off-by: David S. Miller
02 Oct, 2012
1 commit
-
Commit ec47ea824774(skb: Add inline helper for getting the skb end offset from
head) introduces this helper function, skb_end_offset(),
we should make use of it.Signed-off-by: Weiping Pan
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
29 Sep, 2012
1 commit
-
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.cThe team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller
28 Sep, 2012
1 commit
-
We currently use percpu order-0 pages in __netdev_alloc_frag
to deliver fragments used by __netdev_alloc_skb()Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
- Better filling of space (the ending hole overhead is less an issue)
- Less calls to page allocator or accesses to page->_count
- Could allow struct skb_shared_info futures changes without major
performance impact.This patch implements a transparent fallback to smaller
pages in case of memory pressure.It also uses a standard "struct page_frag" instead of a custom one.
Signed-off-by: Eric Dumazet
Cc: Alexander Duyck
Cc: Benjamin LaHaise
Signed-off-by: David S. Miller
25 Sep, 2012
1 commit
-
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.This page is used to build fragments for skbs.
Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_pageIts also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.(up to 32768 bytes per frag, thats order-3 pages on x86)
This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Cc: Vijay Subramanian
Cc: Alexander Duyck
Tested-by: Vijay Subramanian
Signed-off-by: David S. Miller
20 Sep, 2012
1 commit
-
It should be the skb which is not cloned
Signed-off-by: Li RongQing
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Aug, 2012
1 commit
-
Change the skb allocation API to indicate RX usage and use this to fall
back to the PFMEMALLOC reserve when needed. SKBs allocated from the
reserve are tagged in skb->pfmemalloc. If an SKB is allocated from the
reserve and the socket is later found to be unrelated to page reclaim, the
packet is dropped so that the memory remains available for page reclaim.
Network protocols are expected to recover from this packet loss.[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[davem@davemloft.net: Use static branches, coding style corrections]
[sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
Signed-off-by: Mel Gorman
Acked-by: David S. Miller
Cc: Neil Brown
Cc: Peter Zijlstra
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jul, 2012
2 commits
-
Export skb_copy_ubufs so that modules can orphan frags.
Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller -
Reduce code duplication a bit using the new helper.
Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller
20 Jul, 2012
1 commit
-
Conflicts:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
19 Jul, 2012
1 commit
-
Use correct allocation flags during copy of user space fragments
to the kernel. Also "improve" couple of for loops.Signed-off-by: Krishna Kumar
Signed-off-by: David S. Miller
16 Jul, 2012
1 commit
-
Few drivers use GFP_DMA allocations, and netdev_alloc_frag()
doesn't allocate pages in DMA zone.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Jul, 2012
1 commit
-
This patch is meant to help improve performance by reducing the number of
locked operations required to allocate a frag on x86 and other platforms.
This is accomplished by using atomic_set operations on the page count
instead of calling get_page and put_page. It is based on work originally
provided by Eric Dumazet.In addition it also helps to reduce memory overhead when using TCP. This
is done by recycling the page if the only holder of the frame is the
netdev_alloc_frag call itself. This can occur when skb heads are stolen by
either GRO or TCP and the driver providing the packets is using paged frags
to store all of the data for the packets.Cc: Eric Dumazet
Signed-off-by: Alexander Duyck
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Jul, 2012
1 commit
-
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
05 Jul, 2012
1 commit
04 Jul, 2012
1 commit
-
Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.- Three bug fixes for drbd from Lars Ellenberg.
- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.- A few fixes from Tejun.
- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.
14 Jun, 2012
1 commit
-
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: Dave Jones
Signed-off-by: Eric Dumazet
Cc: Jens Axboe
Cc: Alexander Viro
Cc: Tom Herbert
Cc: stable # 2.6.35
Tested-by: Dave Jones
Signed-off-by: Jens Axboe
13 Jun, 2012
1 commit
-
Conflicts:
MAINTAINERS
drivers/net/wireless/iwlwifi/pcie/trans.cThe iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.Signed-off-by: David S. Miller
09 Jun, 2012
1 commit
-
Fix kernel-doc warnings in net/core:
Warning(net/core/skbuff.c:3368): No description found for parameter 'delta_truesize'
Warning(net/core/filter.c:628): No description found for parameter 'pfp'
Warning(net/core/filter.c:628): Excess function parameter 'sk' description in 'sk_unattached_filter_create'Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller
08 Jun, 2012
1 commit
-
__alloc_skb() now extends tailroom to allow the use of padding added
by the heap allocator.Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
20 May, 2012
1 commit
-
Move tcp_try_coalesce() protocol independent part to
skb_try_coalesce().skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
to build optimized skbs (less sk_buff, and possibly less 'headers')skb_try_coalesce() is zero copy, unless the copy can fit in destination
header (its a rare case)kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
because IPv6 will need it in patch (ipv6: use skb coalescing in
reassembly).Signed-off-by: Eric Dumazet
Cc: Alexander Duyck
Signed-off-by: David S. Miller
19 May, 2012
1 commit
-
Fix two issues introduced in commit a1c7fff7e18f5
( net: netdev_alloc_skb() use build_skb() )- Must be IRQ safe (non NAPI drivers can use it)
- Must not leak the frag if build_skb() fails to allocate sk_buffThis patch introduces netdev_alloc_frag() for drivers willing to
use build_skb() instead of __netdev_alloc_skb() variants.Factorize code so that :
__dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and
dev_alloc_skb() a wrapper around netdev_alloc_skb()Use __GFP_COLD flag.
Almost all network drivers now benefit from skb->head_frag
infrastructure.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 May, 2012
1 commit
-
netdev_alloc_skb() is used by networks driver in their RX path to
allocate an skb to receive an incoming frame.With recent skb->head_frag infrastructure, it makes sense to change
netdev_alloc_skb() to use build_skb() and a frag allocator.This permits a zero copy splice(socket->pipe), and better GRO or TCP
coalescing.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 May, 2012
1 commit
-
Use the current logging style.
This enables use of dynamic debugging as well.
Convert printk(KERN_ to pr_.
Add pr_fmt. Remove embedded prefixes, use
%s, __func__ instead.Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
16 May, 2012
1 commit
-
Standardize the net core ratelimited logging functions.
Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
07 May, 2012
3 commits
-
With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head. Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
Since there is now only one spot that actually uses "fastpath" there isn't
much point in carrying it. Instead we can just use a check for skb_cloned
to verify if we can perform the fast-path free for the head or not.Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
The fast-path for pskb_expand_head contains a check where the size plus the
unaligned size of skb_shared_info is compared against the size of the data
buffer. This code path has two issues. First is the fact that after the
recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
is always placed in the optimal spot for a buffer size making this check
unnecessary. The second issue is the fact that the check doesn't take into
account the aligned size of shared info. As a result the code burns cycles
doing a memcpy with nothing actually being shifted.Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
04 May, 2012
2 commits
-
This patch adds support for a skb_head_is_locked helper function. It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag. If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
GRO is very optimistic in skb truesize estimates, only taking into
account the used part of fragments.Be conservative, and use more precise computation, so that bloated GRO
skbs can be collapsed eventually.Signed-off-by: Eric Dumazet
Cc: Alexander Duyck
Cc: Jeff Kirsher
Acked-by: Alexander Duyck
Signed-off-by: David S. Miller
03 May, 2012
1 commit
-
This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned. This allows the other clones to
track each other via shinfo->dataref.Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count. As a result it becomes
difficult to track how many references there are to skb->head.Signed-off-by: Alexander Duyck
Cc: Eric Dumazet
Cc: Jeff Kirsher
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
01 May, 2012
3 commits
-
__skb_splice_bits() can check if skb to be spliced has its skb->head
mapped to a page fragment, instead of a kmalloc() area.If so we can avoid a copy of the skb head and get a reference on
underlying page.Signed-off-by: Eric Dumazet
Cc: Ilpo Järvinen
Cc: Herbert Xu
Cc: Maciej Żenczykowski
Cc: Neal Cardwell
Cc: Tom Herbert
Cc: Jeff Kirsher
Cc: Ben Hutchings
Cc: Matt Carlson
Cc: Michael Chan
Signed-off-by: David S. Miller -
GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.We 'upgrade' skb->head as a fragment in itself
This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.This is a followup of patch "net: allow skb->head to be a page fragment"
Signed-off-by: Eric Dumazet
Cc: Ilpo Järvinen
Cc: Herbert Xu
Cc: Maciej Żenczykowski
Cc: Neal Cardwell
Cc: Tom Herbert
Cc: Jeff Kirsher
Cc: Ben Hutchings
Cc: Matt Carlson
Cc: Michael Chan
Signed-off-by: David S. Miller -
skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.We have three spots were it hurts :
1) GRO aggregation
When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.2) splice(socket -> pipe).
We must copy the linear part to a page fragment.
This kind of defeats splice() purpose (zero copy claim)3) TCP coalescing.
Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.Signed-off-by: Eric Dumazet
Cc: Ilpo Järvinen
Cc: Herbert Xu
Cc: Maciej Żenczykowski
Cc: Neal Cardwell
Cc: Tom Herbert
Cc: Jeff Kirsher
Cc: Ben Hutchings
Cc: Matt Carlson
Cc: Michael Chan
Signed-off-by: David S. Miller
24 Apr, 2012
3 commits
-
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Signed-off-by: David S. Miller
-
Commit 35f3d14db (pipe: add support for shrinking and growing pipes)
added a slowdown for splice(socket -> pipe), as we might grow the spd
used in skb_splice_bits() for each skb we process in splice() syscall.Its not needed since skb lengths are capped. The default on-stack arrays
are more than enough.Use MAX_SKB_FRAGS instead of PIPE_DEF_BUFFERS to describe the reasonable
limit per skb.Add coalescing support to help splicing of GRO skbs built from linear
skbs (linked into frag_list)Signed-off-by: Eric Dumazet
Cc: Jens Axboe
Cc: Tom Herbert
Signed-off-by: David S. Miller
22 Apr, 2012
1 commit
-
splice() from socket to pipe needs linear_to_page() helper to transfert
skb header to part of page.We can reset the offset in the current sk->sk_sndmsg_page if we are the
last user of the page.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Apr, 2012
1 commit
-
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Apr, 2012
1 commit
-
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller