Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 May, 2013

1 commit

2fbd96797 ipv4: ip_output: remove inline marking of EXPORT_SYMBOL functions ... Browse Code »

EXPORT_SYMBOL and inline directives are contradictory to each other.
The patch fixes this inconsistency.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Denis Efremov
Signed-off-by: David S. Miller

Denis Efremov
2013-05-12 07:12:44 +0800

02 Apr, 2013

1 commit

f01658886 netfilter: use IS_ENABLE to replace if defined in TRACE target ... Browse Code »

Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2013-04-02 07:03:45 +0800

14 Feb, 2013

1 commit

c9af6db4c net: Fix possible wrong checksum generation. ... Browse Code »

Patch cef401de7be8c4e (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.

Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.

tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2013-02-14 02:30:10 +0800

09 Dec, 2012

1 commit

fc70fb640 net: Handle encapsulated offloads before fragmentation or handing to lower dev ... Browse Code »

This change allows the VXLAN to enable Tx checksum offloading even on
devices that do not support encapsulated checksum offloads. The
advantage to this is that it allows for the lower device to change due
to routing table changes without impacting features on the VXLAN itself.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2012-12-09 13:20:28 +0800

09 Oct, 2012

1 commit

155e8336c ipv4: introduce rt_uses_gateway ... Browse Code »

Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.

Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2012-10-09 05:42:36 +0800

25 Sep, 2012

1 commit

5640f7685 net: use a per task frag allocator ... Browse Code »

We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Cc: Vijay Subramanian
Cc: Alexander Duyck
Tested-by: Vijay Subramanian
Signed-off-by: David S. Miller

Eric Dumazet
2012-09-25 04:31:37 +0800

27 Aug, 2012

1 commit

5f2d04f1f ipv4: fix path MTU discovery with connection tracking ... Browse Code »

IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy
Acked-by: Eric Dumazet
Acked-by: David S. Miller

Patrick McHardy
2012-08-27 01:13:55 +0800

22 Aug, 2012

1 commit

a9915a1b5 ipv4: fix ip header ident selection in __ip_make_skb() ... Browse Code »

Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3fb (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde
Signed-off-by: Eric Dumazet
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-22 05:51:06 +0800

11 Aug, 2012

1 commit

b5ec8eeac ipv4: fix ip_send_skb() ... Browse Code »

ip_send_skb() can send orphaned skb, so we must pass the net pointer to
avoid possible NULL dereference in error path.

Bug added by commit 3a7c384ffd57 (ipv4: tcp: unicast_sock should not
land outside of TCP stack)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-11 05:08:57 +0800

10 Aug, 2012

1 commit

3a7c384ff ipv4: tcp: unicast_sock should not land outside of TCP stack ... Browse Code »

commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a
selinux regression, reported and bisected by John Stultz

selinux_ip_postroute_compat() expect to find a valid sk->sk_security
pointer, but this field is NULL for unicast_sock

It turns out that unicast_sock are really temporary stuff to be able
to reuse part of IP stack (ip_append_data()/ip_push_pending_frames())

Fact is that frames sent by ip_send_unicast_reply() should be orphaned
to not fool LSM.

Note IPv6 never had this problem, as tcp_v6_send_response() doesnt use a
fake socket at all. I'll probably implement tcp_v4_send_response() to
remove these unicast_sock in linux-3.7

Reported-by: John Stultz
Bisected-by: John Stultz
Signed-off-by: Eric Dumazet
Cc: Paul Moore
Cc: Eric Paris
Cc: "Serge E. Hallyn"
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-10 11:56:08 +0800

07 Aug, 2012

1 commit

9871f1ad6 ip: fix error handling in ip_finish_output2() ... Browse Code »

__neigh_create() returns either a pointer to struct neighbour or PTR_ERR().
But the caller expects it to return either a pointer or NULL. Replace
the NULL check with IS_ERR() check.

The bug was introduced in a263b3093641fb1ec377582c90986a7fd0625184
("ipv4: Make neigh lookups directly in output packet path.").

Signed-off-by: Vasily Kulikov
Signed-off-by: David S. Miller

Vasiliy Kulikov
2012-08-07 04:30:01 +0800

23 Jul, 2012

2 commits

5e9965c15 Merge branch 'kill_rtcache' ... Browse Code »

The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables. The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

The general flow of this patch series is that first the routing cache
is removed. We build a completely new rtable entry every lookup
request.

Next we make some simplifications due to the fact that removing the
routing cache causes several members of struct rtable to become no
longer necessary.

Then we need to make some amends such that we can legally cache
pre-constructed routes in the FIB nexthops. Firstly, we need to
invalidate routes which are hit with nexthop exceptions. Secondly we
have to change the semantics of rt->rt_gateway such that zero means
that the destination is on-link and non-zero otherwise.

Now that the preparations are ready, we start caching precomputed
routes in the FIB nexthops. Output and input routes need different
kinds of care when determining if we can legally do such caching or
not. The details are in the commit log messages for those changes.

The patch series then winds down with some more struct rtable
simplifications and other tidy ups that remove unnecessary overhead.

On a SPARC-T3 output route lookups are ~876 cycles. Input route
lookups are ~1169 cycles with rpfilter disabled, and about ~1468
cycles with rpfilter enabled.

These measurements were taken with the kbench_mod test module in the
net_test_tools GIT tree:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git

That GIT tree also includes a udpflood tester tool and stresses
route lookups on packet output.

For example, on the same SPARC-T3 system we can run:

time ./udpflood -l 10000000 10.2.2.11

with routing cache:
real 1m21.955s user 0m6.530s sys 1m15.390s

without routing cache:
real 1m31.678s user 0m6.520s sys 1m25.140s

Performance undoubtedly can easily be improved further.

For example fib_table_lookup() performs a lot of excessive
computations with all the masking and shifting, some of it
conditionalized to deal with edge cases.

Also, Eric's no-ref optimization for input route lookups can be
re-instated for the FIB nexthop caching code path. I would be really
pleased if someone would work on that.

In fact anyone suitable motivated can just fire up perf on the loading
of the test net_test_tools benchmark kernel module. I spend much of
my time going:

bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2
bash# perf report

Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben
Hutchings, and others.

Signed-off-by: David S. Miller

David S. Miller
2012-07-23 08:04:15 +0800
0980e56e5 ipv4: tcp: set unicast_sock uc_ttl to -1 ... Browse Code »

Set unicast_sock uc_ttl to -1 so that we select the right ttl,
instead of sending packets with a 0 ttl.

Bug added in commit be9f4a44e7d4 (ipv4: tcp: remove per net tcp_sock)

Signed-off-by: Hiroaki SHIMODA
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-23 03:06:21 +0800

21 Jul, 2012

1 commit

f8126f1d5 ipv4: Adjust semantics of rt->rt_gateway. ... Browse Code »

In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.

Signed-off-by: David S. Miller
Tested-by: Vijay Subramanian

David S. Miller
2012-07-21 04:31:20 +0800

20 Jul, 2012

1 commit

be9f4a44e ipv4: tcp: remove per net tcp_sock ... Browse Code »

tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-20 01:35:30 +0800

05 Jul, 2012

2 commits

5110effee net: Do delayed neigh confirmation. ... Browse Code »

When a dst_confirm() happens, mark the confirmation as pending in the
dst. Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL. So just fix those cases up.

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:03:06 +0800
a263b3093 ipv4: Make neigh lookups directly in output packet path. ... Browse Code »

Do not use the dst cached neigh, we'll be getting rid of that.

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:02:12 +0800

28 Jun, 2012

1 commit

70e734167 ipv4: Show that ip_send_reply() is purely unicast routine. ... Browse Code »

Rename it to ip_send_unicast_reply() and add explicit 'saddr'
argument.

This removed one of the few users of rt->rt_spec_dst.

Signed-off-by: David S. Miller

David S. Miller
2012-06-28 18:21:41 +0800

13 Jun, 2012

1 commit

95603e229 net-next: add dev_loopback_xmit() to avoid duplicate code ... Browse Code »

Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).

I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.

ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().

Signed-off-by: Michel Machado
CC: "David S. Miller"
CC: Alexey Kuznetsov
CC: James Morris
CC: Hideaki YOSHIFUJI
CC: Patrick McHardy
CC: Eric Dumazet
CC: Jiri Pirko
CC: "Michał Mirosław"
CC: Ben Hutchings
Signed-off-by: David S. Miller

Michel Machado
2012-06-13 09:51:09 +0800

04 Jun, 2012

1 commit

5d0ba55b6 net: use consume_skb() in place of kfree_skb() ... Browse Code »

Remove some dropwatch/drop_monitor false positives.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-06-04 23:27:40 +0800

16 May, 2012

1 commit

e87cc4728 net: Convert net_ratelimit uses to net_<level>_ratelimited ... Browse Code »

Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2012-05-16 01:45:03 +0800

29 Mar, 2012

1 commit

9ffc93f20 Remove all #inclusions of asm/system.h ... Browse Code »

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

Signed-off-by: David Howells

David Howells
2012-03-29 01:30:03 +0800

06 Dec, 2011

1 commit

272174550 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. ... Browse Code »

To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller
Acked-by: Roland Dreier

David Miller
2011-12-06 04:20:19 +0800

02 Dec, 2011

1 commit

84f9307c5 ipv4: use a 64bit load/store in output path ... Browse Code »

gcc compiler is smart enough to use a single load/store if we
memcpy(dptr, sptr, 8) on x86_64, regardless of
CONFIG_CC_OPTIMIZE_FOR_SIZE

In IP header, daddr immediately follows saddr, this wont change in the
future. We only need to make sure our flowi4 (saddr,daddr) fields wont
break the rule.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-02 02:28:54 +0800

24 Oct, 2011

1 commit

66b13d99d ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT ... Browse Code »

There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.

In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.

Example of things that were broken :
- Routing using TOS as a selector
- Firewalls
- Trafic classification / shaping

We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.

Notes :
- We still reflect incoming TOS in RST messages.
- We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
- A patch is needed for IPv6

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-24 15:06:21 +0800

19 Oct, 2011

1 commit

9e903e085 net: add skb frag size accessors ... Browse Code »

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-19 15:10:46 +0800

25 Aug, 2011

1 commit

aff65da0f net: ipv4: convert to SKB frag APIs ... Browse Code »

Signed-off-by: Ian Campbell
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: "Pekka Savola (ipv6)"
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Ian Campbell
2011-08-25 08:52:11 +0800

08 Aug, 2011

1 commit

d52fbfc9e ipv4: use dst with ref during bcast/mcast loopback ... Browse Code »

Make sure skb dst has reference when moving to
another context. Currently, I don't see protocols that can
hit it when sending broadcasts/multicasts to loopback using
noref dsts, so it is just a precaution.

Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2011-08-08 13:52:32 +0800

03 Aug, 2011

1 commit

f2c31e32b net: fix NULL dereferences in check_peer_redir() ... Browse Code »

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde778 (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-08-03 18:34:12 +0800

22 Jul, 2011

1 commit

d9be4f7a6 ipv4: Constrain UFO fragment sizes to multiples of 8 bytes ... Browse Code »

Because the ip fragment offset field counts 8-byte chunks, ip
fragments other than the last must contain a multiple of 8 bytes of
payload. ip_ufo_append_data wasn't respecting this constraint and,
depending on the MTU and ip option sizes, could create malformed
non-final fragments.

Google-Bug-Id: 5009328
Signed-off-by: Bill Sommerfeld
Signed-off-by: David S. Miller

Bill Sommerfeld
2011-07-22 12:31:41 +0800

18 Jul, 2011

1 commit

69cce1d14 net: Abstract dst->neighbour accesses behind helpers. ... Browse Code »

dst_{get,set}_neighbour()

Signed-off-by: David S. Miller

David S. Miller
2011-07-18 14:11:35 +0800

17 Jul, 2011

2 commits

05e3aa094 net: Create and use new helper, neigh_output(). ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-07-17 08:26:00 +0800
fec8292d9 ipv4: Use calculated 'neigh' instead of re-evaluating dst->neighbour ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-07-17 05:25:54 +0800

14 Jul, 2011

1 commit

f6b72b621 net: Embed hh_cache inside of struct neighbour. ... Browse Code »

Now that there is a one-to-one correspondance between neighbour
and hh_cache entries, we no longer need:

1) dynamic allocation
2) attachment to dst->hh
3) refcounting

Initialization of the hh_cache entry is indicated by hh_len
being non-zero, and such initialization is always done with
the neighbour's lock held as a writer.

Signed-off-by: David S. Miller

David S. Miller
2011-07-14 22:53:20 +0800

06 Jul, 2011

1 commit

e12fe68ce Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Browse Code »

David S. Miller
2011-07-06 14:23:37 +0800

02 Jul, 2011

1 commit

c146066ab ipv4: Don't use ufo handling on later transformed packets ... Browse Code »

We might call ip_ufo_append_data() for packets that will be IPsec
transformed later. This function should be used just for real
udp packets. So we check for rt->dst.header_len which is only
nonzero on IPsec handling and call ip_ufo_append_data() just
if rt->dst.header_len is zero.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-07-02 08:33:19 +0800

28 Jun, 2011

2 commits

353e5c9ab ipv4: Fix IPsec slowpath fragmentation problem ... Browse Code »

ip_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-06-28 11:34:26 +0800
33f99dc7f ipv4: Fix packet size calculation in __ip_append_data ... Browse Code »

Git commit 59104f06 (ip: take care of last fragment in ip_append_data)
added a check to see if we exceed the mtu when we add trailer_len.
However, the mtu is already subtracted by the trailer length when the
xfrm transfomation bundles are set up. So IPsec packets with mtu
size get fragmented, or if the DF bit is set the packets will not
be send even though they match the mtu perfectly fine. This patch
actually reverts commit 59104f06.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-06-28 11:34:25 +0800

22 Jun, 2011

1 commit

56f8a75c1 ip: introduce ip_is_fragment helper inline function ... Browse Code »

There are enough instances of this:

iph->frag_off & htons(IP_MF | IP_OFFSET)

that a helper function is probably warranted.

Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Paul Gortmaker
2011-06-22 11:33:34 +0800

10 Jun, 2011

1 commit

96d7303e9 ipv4: Fix packet size calculation for raw IPsec packets in __ip_append_data ... Browse Code »

We assume that transhdrlen is positive on the first fragment
which is wrong for raw packets. So we don't add exthdrlen to the
packet size for raw packets. This leads to a reallocation on IPsec
because we have not enough headroom on the skb to place the IPsec
headers. This patch fixes this by adding exthdrlen to the packet
size whenever the send queue of the socket is empty. This issue was
introduced with git commit 1470ddf7 (inet: Remove explicit write
references to sk/inet in ip_append_data)

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-06-10 05:49:59 +0800