Doug / smarc-fsl-linux-kernel | Embedian Git Server

16 Feb, 2013

2 commits

68c331631 v4 GRE: Add TCP segmentation offload for GRE ... Browse Code »

Following patch adds GRE protocol offload handler so that
skb_gso_segment() can segment GRE packets.
SKB GSO CB is added to keep track of total header length so that
skb_segment can push entire header. e.g. in case of GRE, skb_segment
need to push inner and outer headers to every segment.
New NETIF_F_GRE_GSO feature is added for devices which support HW
GRE TSO offload. Currently none of devices support it therefore GRE GSO
always fall backs to software GSO.

[ Compute pkt_len before ip_local_out() invocation. -DaveM ]

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2013-02-16 04:17:11 +0800
14bbd6a56 net: Add skb_unclone() helper function. ... Browse Code »

This function will be used in next GRE_GSO patch. This patch does
not change any functionality.

Signed-off-by: Pravin B Shelar
Acked-by: Eric Dumazet

Pravin B Shelar
2013-02-16 04:10:37 +0800

14 Feb, 2013

1 commit

c9af6db4c net: Fix possible wrong checksum generation. ... Browse Code »

Patch cef401de7be8c4e (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.

Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.

tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2013-02-14 02:30:10 +0800

09 Feb, 2013

1 commit

e5e673058 skbuff: Move definition of NETDEV_FRAG_PAGE_MAX_SIZE ... Browse Code »

In order to address the fact that some devices cannot support the full 32K
frag size we need to have the value accessible somewhere so that we can use it
to do comparisons against what the device can support. As such I am moving
the values out of skbuff.c and into skbuff.h.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2013-02-09 06:33:35 +0800

28 Jan, 2013

1 commit

cef401de7 net: fix possible wrong checksum generation ... Browse Code »

Pravin Shelar mentioned that GSO could potentially generate
wrong TX checksum if skb has fragments that are overwritten
by the user between the checksum computation and transmit.

He suggested to linearize skbs but this extra copy can be
avoided for normal tcp skbs cooked by tcp_sendmsg().

This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
in skb_shinfo(skb)->gso_type if at least one frag can be
modified by the user.

Typical sources of such possible overwrites are {vm}splice(),
sendfile(), and macvtap/tun/virtio_net drivers.

Tested:

$ netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
7.7.8.84 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 3959.52

$ netperf -H 7.7.8.84 -t TCP_SENDFILE
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 3216.80

Performance of the SENDFILE is impacted by the extra allocation and
copy, and because we use order-0 pages, while the TCP_STREAM uses
bigger pages.

Reported-by: Pravin Shelar
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-01-28 13:27:15 +0800

09 Jan, 2013

1 commit

fda55eca5 net: introduce skb_transport_header_was_set() ... Browse Code »

We have skb_mac_header_was_set() helper to tell if mac_header
was set on a skb. We would like the same for transport_header.

__netif_receive_skb() doesn't reset the transport header if already
set by GRO layer.

Note that network stacks usually reset the transport header anyway,
after pulling the network header, so this change only allows
a followup patch to have more precise qdisc pkt_len computation
for GSO packets at ingress side.

Signed-off-by: Eric Dumazet
Cc: Jamal Hadi Salim
Signed-off-by: David S. Miller

Eric Dumazet
2013-01-09 09:51:54 +0800

09 Dec, 2012

1 commit

6a674e9c7 net: Add support for hardware-offloaded encapsulation ... Browse Code »

This patch adds support in the kernel for offloading in the NIC Tx and Rx
checksumming for encapsulated packets (such as VXLAN and IP GRE).

For Tx encapsulation offload, the driver will need to set the right bits
in netdev->hw_enc_features. The protocol driver will have to set the
skb->encapsulation bit and populate the inner headers, so the NIC driver will
use those inner headers to calculate the csum in hardware.

For Rx encapsulation offload, the driver will need to set again the
skb->encapsulation flag and the skb->ip_csum to CHECKSUM_UNNECESSARY.
In that case the protocol driver should push the decapsulated packet up
to the stack, again with CHECKSUM_UNNECESSARY. In ether case, the protocol
driver should set the skb->encapsulation flag back to zero. Finally the
protocol driver should have NETIF_F_RXCSUM flag set in its features.

Signed-off-by: Joseph Gasparakis
Signed-off-by: Peter P Waskiewicz Jr
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Joseph Gasparakis
2012-12-09 13:20:28 +0800

03 Nov, 2012

2 commits

25121173f skb: api to report errors for zero copy skbs ... Browse Code »

Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Michael S. Tsirkin
2012-11-03 09:29:57 +0800
e19d6763c skb: report completion status for zero copy skbs ... Browse Code »

Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Michael S. Tsirkin
2012-11-03 09:29:57 +0800

01 Nov, 2012

1 commit

ecd5cf5dc net: compute skb->rxhash if nic hash may be 3-tuple ... Browse Code »

Network device drivers can communicate a Toeplitz hash in skb->rxhash,
but devices differ in their hashing capabilities. All compute a 5-tuple
hash for TCP over IPv4, but for other connection-oriented protocols,
they may compute only a 3-tuple. This breaks RPS load balancing, e.g.,
for TCP over IPv6 flows. Additionally, for GRE and other tunnels,
the kernel computes a 5-tuple hash over the inner packet if possible,
but devices do not.

This patch recomputes the rxhash in software in all cases where it
cannot be certain that a 5-tuple was computed. Device drivers can avoid
recomputation by setting the skb->l4_rxhash flag.

Recomputing adds cycles to each packet when RPS is enabled or the
packet arrives over a tunnel. A comparison of 200x TCP_STREAM between
two servers running unmodified netnext with rxhash computation
in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows
how much time is spent in __skb_get_rxhash in this worst case:

0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash
0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash
0.05% swapper [kernel.kallsyms] [k] __skb_get_rxhash

With 200x TCP_RR it increases to

0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash
0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash
0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash

I considered having the patch explicitly skips recomputation when it knows
that it will not improve the hash (TCP over IPv4), but that conditional
complicates code without saving many cycles in practice, because it has
to take place after flow dissector.

Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Willem de Bruijn
2012-11-01 01:56:40 +0800

07 Oct, 2012

1 commit

acb600def net: remove skb recycling ... Browse Code »

Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.

Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.

With help from Maxime Bizon, who pointed out that commit
87151b8689d (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.

Instead of fixing this bug, lets remove skb recycling.

Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()

Signed-off-by: Eric Dumazet
Cc: Maxime Bizon
Signed-off-by: David S. Miller

Eric Dumazet
2012-10-07 12:40:54 +0800

04 Aug, 2012

1 commit

47061bc44 net: skb_share_check() should use consume_skb() ... Browse Code »

In order to avoid false drop_monitor indications, we should
call consume_skb() if skb_clone() was successful.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-04 16:27:57 +0800

01 Aug, 2012

3 commits

0614002bb netvm: propagate page->pfmemalloc from skb_alloc_page to skb ... Browse Code »

The skb->pfmemalloc flag gets set to true iff during the slab allocation
of data in __alloc_skb that the the PFMEMALLOC reserves were used. If
page splitting is used, it is possible that pages will be allocated from
the PFMEMALLOC reserve without propagating this information to the skb.
This patch propagates page->pfmemalloc from pages allocated for fragments
to the skb.

It works by reintroducing and expanding the skb_alloc_page() API to take
an skb. If the page was allocated from pfmemalloc reserves, it is
automatically copied. If the driver allocates the page before the skb, it
should call skb_propagate_pfmemalloc() after the skb is allocated to
ensure the flag is copied properly.

Failure to do so is not critical. The resulting driver may perform slower
if it is used for swap-over-NBD or swap-over-NFS but it should not result
in failure.

[davem@davemloft.net: API rename and consistency]
Signed-off-by: Mel Gorman
Acked-by: David S. Miller
Cc: Neil Brown
Cc: Peter Zijlstra
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:46 +0800
c48a11c7a netvm: propagate page->pfmemalloc to skb ... Browse Code »

The skb->pfmemalloc flag gets set to true iff during the slab allocation
of data in __alloc_skb that the the PFMEMALLOC reserves were used. If the
packet is fragmented, it is possible that pages will be allocated from the
PFMEMALLOC reserve without propagating this information to the skb. This
patch propagates page->pfmemalloc from pages allocated for fragments to
the skb.

Signed-off-by: Mel Gorman
Acked-by: David S. Miller
Cc: Neil Brown
Cc: Peter Zijlstra
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:46 +0800
c93bdd0e0 netvm: allow skb allocation to use PFMEMALLOC reserves ... Browse Code »

Change the skb allocation API to indicate RX usage and use this to fall
back to the PFMEMALLOC reserve when needed. SKBs allocated from the
reserve are tagged in skb->pfmemalloc. If an SKB is allocated from the
reserve and the socket is later found to be unrelated to page reclaim, the
packet is dropped so that the memory remains available for page reclaim.
Network protocols are expected to recover from this packet loss.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[davem@davemloft.net: Use static branches, coding style corrections]
[sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
Signed-off-by: Mel Gorman
Acked-by: David S. Miller
Cc: Neil Brown
Cc: Peter Zijlstra
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:46 +0800

23 Jul, 2012

1 commit

a353e0ce0 skbuff: add an api to orphan frags ... Browse Code »

Many places do
if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Michael S. Tsirkin
2012-07-23 03:39:33 +0800

16 Jun, 2012

1 commit

62b1a8ab9 net: remove skb_orphan_try() ... Browse Code »

Orphaning skb in dev_hard_start_xmit() makes bonding behavior
unfriendly for applications sending big UDP bursts : Once packets
pass the bonding device and come to real device, they might hit a full
qdisc and be dropped. Without orphaning, the sender is automatically
throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
sk_sndbuf is not too big)

We could try to defer the orphaning adding another test in
dev_hard_start_xmit(), but all this seems of little gain,
now that BQL tends to make packets more likely to be parked
in Qdisc queues instead of NIC TX ring, in cases where performance
matters.

Reverts commits :
fc6055a5ba31 net: Introduce skb_orphan_try()
87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try()
and removes SKBTX_DRV_NEEDS_SK_REF flag

Reported-and-bisected-by: Jean-Michel Hautbois
Signed-off-by: Eric Dumazet
Tested-by: Oliver Hartkopp
Acked-by: Oliver Hartkopp
Signed-off-by: David S. Miller

Eric Dumazet
2012-06-16 06:30:15 +0800

30 May, 2012

1 commit

617c8c112 skb: avoid unnecessary reallocations in __skb_cow ... Browse Code »

At the beginning of __skb_cow, headroom gets set to a minimum of
NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
cloned and the headroom is just below NET_SKB_PAD, but still more than the
amount requested by the caller.
This was showing up frequently in my tests on VLAN tx, where
vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).

Locally generated packets should have enough headroom, and for forward
paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to
add any extra space here.

Signed-off-by: Felix Fietkau
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Felix Fietkau
2012-05-30 05:30:08 +0800

20 May, 2012

1 commit

bad43ca83 net: introduce skb_try_coalesce() ... Browse Code »

Move tcp_try_coalesce() protocol independent part to
skb_try_coalesce().

skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
to build optimized skbs (less sk_buff, and possibly less 'headers')

skb_try_coalesce() is zero copy, unless the copy can fit in destination
header (its a rare case)

kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
because IPv6 will need it in patch (ipv6: use skb coalescing in
reassembly).

Signed-off-by: Eric Dumazet
Cc: Alexander Duyck
Signed-off-by: David S. Miller

Eric Dumazet
2012-05-20 06:34:57 +0800

19 May, 2012

1 commit

6f532612c net: introduce netdev_alloc_frag() ... Browse Code »

Fix two issues introduced in commit a1c7fff7e18f5
( net: netdev_alloc_skb() use build_skb() )

- Must be IRQ safe (non NAPI drivers can use it)
- Must not leak the frag if build_skb() fails to allocate sk_buff

This patch introduces netdev_alloc_frag() for drivers willing to
use build_skb() instead of __netdev_alloc_skb() variants.

Factorize code so that :
__dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and
dev_alloc_skb() a wrapper around netdev_alloc_skb()

Use __GFP_COLD flag.

Almost all network drivers now benefit from skb->head_frag
infrastructure.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-05-19 01:31:25 +0800

08 May, 2012

1 commit

0d6c4a2e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/intel/e1000e/param.c
drivers/net/wireless/iwlwifi/iwl-agn-rx.c
drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell. In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr. 'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller

David S. Miller
2012-05-08 11:35:40 +0800

07 May, 2012

1 commit

ec47ea824 skb: Add inline helper for getting the skb end offset from head ... Browse Code »

With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head. Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.

Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Alexander Duyck
2012-05-07 01:13:19 +0800

04 May, 2012

1 commit

3a7c1ee4a skb: Add skb_head_is_locked helper function ... Browse Code »

This patch adds support for a skb_head_is_locked helper function. It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag. If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.

Signed-off-by: Alexander Duyck
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Alexander Duyck
2012-05-04 01:18:37 +0800

01 May, 2012

4 commits

d96194966 net: fix two typos in skbuff.h ... Browse Code »

fix kernel doc typos in function names

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-05-01 21:40:19 +0800
18d070002 net: skb_peek()/skb_peek_tail() cleanups ... Browse Code »

remove useless casts and rename variables for less confusion.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-05-01 21:39:48 +0800
d7e8883cf net: make GRO aware of skb->head_frag ... Browse Code »

GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.

We 'upgrade' skb->head as a fragment in itself

This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.

This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.

This is a followup of patch "net: allow skb->head to be a page fragment"

Signed-off-by: Eric Dumazet
Cc: Ilpo Järvinen
Cc: Herbert Xu
Cc: Maciej Żenczykowski
Cc: Neal Cardwell
Cc: Tom Herbert
Cc: Jeff Kirsher
Cc: Ben Hutchings
Cc: Matt Carlson
Cc: Michael Chan
Signed-off-by: David S. Miller

Eric Dumazet
2012-05-01 09:35:49 +0800
d3836f21b net: allow skb->head to be a page fragment ... Browse Code »

skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.

We have three spots were it hurts :

1) GRO aggregation

When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.

2) splice(socket -> pipe).

We must copy the linear part to a page fragment.
This kind of defeats splice() purpose (zero copy claim)

3) TCP coalescing.

Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)

Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.

build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.

Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.

This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.

Signed-off-by: Eric Dumazet
Cc: Ilpo Järvinen
Cc: Herbert Xu
Cc: Maciej Żenczykowski
Cc: Neal Cardwell
Cc: Tom Herbert
Cc: Jeff Kirsher
Cc: Ben Hutchings
Cc: Matt Carlson
Cc: Michael Chan
Signed-off-by: David S. Miller

Eric Dumazet
2012-05-01 09:35:11 +0800

24 Apr, 2012

1 commit

38ba0a65f net: skb_can_coalesce returns a boolean ... Browse Code »

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-24 12:18:02 +0800

20 Apr, 2012

1 commit

bf1ac5ca6 nf_bridge: remove holes in struct nf_bridge_info ... Browse Code »

Put use & mask on same location to avoid two holes on 64bit arches

Signed-off-by: Eric Dumazet
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-20 03:18:50 +0800

14 Apr, 2012

1 commit

ca8f4fb21 skbuff: struct ubuf_info callback type safety ... Browse Code »

The skb struct ubuf_info callback gets passed struct ubuf_info
itself, not the arg value as the field name and the function signature
seem to imply. Rename the arg field to ctx to match usage,
add documentation and change the callback argument type
to make usage clear and to have compiler check correctness.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Michael S. Tsirkin
2012-04-14 01:09:19 +0800

11 Apr, 2012

1 commit

a21d45726 tcp: avoid order-1 allocations on wifi and tx path ... Browse Code »

Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN
Tested-by: Marc MERLIN
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-11 22:11:12 +0800

29 Mar, 2012

2 commits

0195c0024 Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/sc… ... Browse Code »

…m/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
"Here are a bunch of patches to disintegrate asm/system.h into a set of
separate bits to relieve the problem of circular inclusion
dependencies.

I've built all the working defconfigs from all the arches that I can
and made sure that they don't break.

The reason for these patches is that I recently encountered a circular
dependency problem that came about when I produced some patches to
optimise get_order() by rewriting it to use ilog2().

This uses bitops - and on the SH arch asm/bitops.h drags in
asm-generic/get_order.h by a circuituous route involving asm/system.h.

The main difficulty seems to be asm/system.h. It holds a number of
low level bits with no/few dependencies that are commonly used (eg.
memory barriers) and a number of bits with more dependencies that
aren't used in many places (eg. switch_to()).

These patches break asm/system.h up into the following core pieces:

(1) asm/barrier.h

Move memory barriers here. This already done for MIPS and Alpha.

(2) asm/switch_to.h

Move switch_to() and related stuff here.

(3) asm/exec.h

Move arch_align_stack() here. Other process execution related bits
could perhaps go here from asm/processor.h.

(4) asm/cmpxchg.h

Move xchg() and cmpxchg() here as they're full word atomic ops and
frequently used by atomic_xchg() and atomic_cmpxchg().

(5) asm/bug.h

Move die() and related bits.

(6) asm/auxvec.h

Move AT_VECTOR_SIZE_ARCH here.

Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that. We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
Delete all instances of asm/system.h
Remove all #inclusions of asm/system.h
Add #includes needed to permit the removal of asm/system.h
Move all declarations of free_initmem() to linux/mm.h
Disintegrate asm/system.h for OpenRISC
Split arch_align_stack() out from asm-generic/system.h
Split the switch_to() wrapper out of asm-generic/system.h
Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
Create asm-generic/barrier.h
Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
Disintegrate asm/system.h for Xtensa
Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
Disintegrate asm/system.h for Tile
Disintegrate asm/system.h for Sparc
Disintegrate asm/system.h for SH
Disintegrate asm/system.h for Score
Disintegrate asm/system.h for S390
Disintegrate asm/system.h for PowerPC
Disintegrate asm/system.h for PA-RISC
Disintegrate asm/system.h for MN10300
...

Linus Torvalds
2012-03-29 06:58:21 +0800
9ffc93f20 Remove all #inclusions of asm/system.h ... Browse Code »

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

Signed-off-by: David Howells

David Howells
2012-03-29 01:30:03 +0800

28 Mar, 2012

1 commit

de8856d2c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
1) Name string overrun fix in gianfar driver from Joe Perches.

2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El

3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso.

4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso.

5) Add a parameter to skb_add_rx_frag() so we can fix the truesize
adjustments in the drivers that use it. The individual drivers
aren't fixed by this commit, but will be dealt with using follow-on
commits. From Eric Dumazet.

6) Add some device IDs to qmi_wwan driver, from Andrew Bird.

7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From
Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()
net: add a truesize parameter to skb_add_rx_frag()
gianfar: Fix possible overrun and simplify interrupt name field creation
USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces
USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces
USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces
qlcnic: Bug fix for LRO
netfilter: nf_conntrack: permanently attach timeout policy to conntrack
netfilter: xt_CT: fix assignation of the generic protocol tracker
netfilter: xt_CT: missing rcu_read_lock section in timeout assignment
netfilter: cttimeout: fix dependency with l4protocol conntrack module
netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6
vhost: fix release path lockdep checks
vhost: don't forget to schedule()
tools/virtio: stub out strong barriers
tools/virtio: add linux/hrtimer.h stub
tools/virtio: add linux/module.h stub

Linus Torvalds
2012-03-28 07:52:32 +0800

26 Mar, 2012

1 commit

50269e19a net: add a truesize parameter to skb_add_rx_frag() ... Browse Code »

skb_add_rx_frag() API is misleading.

Network skbs built with this helper can use uncharged kernel memory and
eventually stress/crash machine in OOM.

Add a 'truesize' parameter and then fix drivers in followup patches.

Signed-off-by: Eric Dumazet
Cc: Wey-Yi Guy
Signed-off-by: David S. Miller

Eric Dumazet
2012-03-26 01:29:58 +0800

25 Mar, 2012

1 commit

ed2d265d1 Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

Pull cleanup from Paul Gortmaker:
"The changes shown here are to unify linux's BUG support under the one
file. Due to historical reasons, we have some BUG code
in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in
linux/kernel.h predates the addition of linux/bug.h, but old code in
kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h
was including to pseudo link them.

This has caused confusion[1] and general yuck/WTF[2] reactions. Here
is an example that violates the principle of least surprise:

CC lib/string.o
lib/string.c: In function 'strlcat':
lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
make[2]: *** [lib/string.o] Error 1
$
$ grep linux/bug.h lib/string.c
#include
$

We've included for the BUG infrastructure and yet we
still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
very confusing for someone who is new to kernel development.

With the above in mind, the goals of this changeset are:

1) find and fix any include/*.h files that were relying on the
implicit presence of BUG code.
2) find and fix any C files that were consuming kernel.h and hence
relying on implicitly getting some/all BUG code.
3) Move the BUG related code living in kernel.h to
4) remove the asm/bug.h from kernel.h to finally break the chain.

During development, the order was more like 3-4, build-test, 1-2. But
to ensure that git history for bisect doesn't get needless build
failures introduced, the commits have been reorderd to fix the problem
areas in advance.

[1] https://lkml.org/lkml/2012/1/3/90
[2] https://lkml.org/lkml/2012/1/17/414"

Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
and linux-next.

* tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
kernel.h: doesn't explicitly use bug.h, so don't include it.
bug: consolidate BUILD_BUG_ON with other bug code
BUG: headers with BUG/BUG_ON etc. need linux/bug.h
bug.h: add include of it to various implicit C users
lib: fix implicit users of kernel.h for TAINT_WARN
spinlock: macroize assert_spin_locked to avoid bug.h dependency
x86: relocate get/set debugreg fcns to include/asm/debugreg.

Linus Torvalds
2012-03-25 01:08:39 +0800

20 Mar, 2012

1 commit

3af79302b net: update the usage of CHECKSUM_UNNECESSARY ... Browse Code »

As suggested by Ben, this adds the clarification on the usage of
CHECKSUM_UNNECESSARY on the outgoing patch. Also add the usage
description of NETIF_F_FCOE_CRC and CHECKSUM_UNNECESSARY
for the kernel FCoE protocol driver.

This is a follow-up to the following:
http://patchwork.ozlabs.org/patch/147315/

Signed-off-by: Yi Zou
Cc: Ben Hutchings
Cc: Jeff Kirsher
Cc: www.Open-FCoE.org
Signed-off-by: David S. Miller

Yi Zou
2012-03-20 05:37:35 +0800

10 Mar, 2012

1 commit

bdcc0924c net: Use bool in skbuff.h helper functions. ... Browse Code »

In particular do this for skb_is_nonlinear(), skb_is_gso(), and
skb_is_gso_v6().

Signed-off-by: David S. Miller

David S. Miller
2012-03-10 06:34:50 +0800

05 Mar, 2012

1 commit

187f1882b BUG: headers with BUG/BUG_ON etc. need linux/bug.h ... Browse Code »

If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-03-05 06:54:34 +0800

27 Feb, 2012

1 commit

ff4783ce7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller

David S. Miller
2012-02-27 10:55:51 +0800