02 Oct, 2009
1 commit
-
This patch against v2.6.31 adds support for route lookup using sk_mark in some
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
lookup correctly if TCP sockets with SO_MARK were used.Signed-off-by: Atis Elsts
Acked-by: Eric Dumazet
15 Sep, 2009
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to in the x86 tree. The generic
header has the same new #define's, so that works out fine.- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.Noted in 'next' by Stephen Rothwell.
03 Sep, 2009
1 commit
-
Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.Example after an UDP tx flood
# netstat -s
...
IP:
1487048 outgoing packets dropped
...
Udp:
...
SndbufErrors: 1487048send() syscalls, do however still return an OK status, to not
break applications.Note : send() manual page explicitly says for -ENOBUFS error :
"The output queue for a network interface was full.
This generally indicates that the interface has stopped sending,
but may be caused by transient congestion.
(Normally, this does not occur in Linux. Packets are just silently
dropped when a device queue overflows.) "This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.Many thanks to Christoph, David, and last but not least, Alexey !
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
28 Aug, 2009
1 commit
-
Add a check in ip_append_data() for NULL *rtp to prevent future bugs in
callers from being exploitable.Signed-off-by: Julien Tinnes
Signed-off-by: Tavis Ormandy
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds
12 Jul, 2009
1 commit
-
After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().Reported-by: Emil S Tantilov
Signed-off-by: Eric Dumazet
Tested-by: Emil S Tantilov
Signed-off-by: David S. Miller
11 Jun, 2009
1 commit
-
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Jun, 2009
1 commit
-
Signed-off-by: David S. Miller
03 Jun, 2009
2 commits
-
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;Delete skb->dst field
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Apr, 2009
1 commit
-
The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols. This patch adds those missing counters to the stats file. Tested
successfully by meWith help from Eric Dumazet.
Signed-off-by: Neil Horman
Signed-off-by: David S. Miller
16 Feb, 2009
1 commit
-
Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.Signed-off-by: Patrick Ohly
Signed-off-by: David S. Miller
25 Nov, 2008
2 commits
-
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit pathSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
03 Nov, 2008
1 commit
-
Signed-off-by: Jianjun Kong
Signed-off-by: David S. Miller
01 Oct, 2008
1 commit
-
Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.Signed-off-by: KOVACS Krisztian
Signed-off-by: David S. Miller
26 Jul, 2008
1 commit
-
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller
17 Jul, 2008
1 commit
-
All the callers already have either the net itself, or the place
where to get it from.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
15 Jul, 2008
1 commit
-
This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
12 Jun, 2008
1 commit
-
This patch removes CVS keywords that weren't updated for a long time
from comments.Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller
30 Apr, 2008
1 commit
-
Problem: ip_append_data() could wrongly generate a chained skb for
devices which support UFO. When sk_write_queue is not empty
(e.g. MSG_MORE), __instead__ of appending data into the next nr_frag
of the queued skb, a new chained skb is created.I would normally assume UFO device should get data in nr_frags and not
in frag_list. Later the udp4_hwcsum_outgoing() resets csum to NONE
and skb_gso_segment() has oops.Proposal:
1. Even length is less than mtu, employ ip_ufo_append_data()
and append data to the __existed__ skb in the sk_write_queue.2. ip_ufo_append_data() is fixed due to a wrong manipulation of
peek-ing and later enqueue-ing of the same skb. Now, enqueuing is
always performed, because on error the further
ip_flush_pending_frames() would release the queued skb.Signed-off-by: Kostya B
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
26 Mar, 2008
1 commit
-
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.Signed-off-by: YOSHIFUJI Hideaki
25 Mar, 2008
2 commits
-
Signed-off-by: YOSHIFUJI Hideaki
-
Replace all the rest of the init_net with a proper net on the IP layer.
Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller
06 Mar, 2008
1 commit
-
(Anonymous) unions can help us to avoid ugly casts.
A common cast it the (struct rtable *)skb->dst one.
Defining an union like :
union {
struct dst_entry *dst;
struct rtable *rtable;
};
permits to use skb->rtable in place.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Feb, 2008
2 commits
-
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.It requires CAP_NET_ADMIN capability.
Signed-off-by: Laszlo Attila Toth
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller -
When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length. This violates the constraints on truesize.This patch postpones the update of skb->truesize to prevent this.
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
29 Jan, 2008
6 commits
-
All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.Other protocols will follow.
Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller -
Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller -
Needed to propagate it down to the __ip_route_output_key.
Signed_off_by: Denis V. Lunev
Signed-off-by: David S. Miller -
It seems that ip_build_xmit is no longer used in here and
ip_append_data is used.Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller -
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.Signed-off-by: Patrick McHardy
Acked-by: Herbert Xu
Signed-off-by: David S. Miller -
Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so. They also share the same output
function dst_output.This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
23 Jan, 2008
2 commits
-
As it is ip_append_data only counts page fragments to the skb that
allocated it. As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.This bug was exposed by the UDP accounting patch.
[ The wmem_alloc bumping needs to be moved with the truesize,
noticed by Takahiro Yasui. -DaveM ]Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
And as noted by Takahiro Yasui, we thus need to bump the
sk->sk_wmem_alloc at this spot as well.Signed-off-by: David S. Miller
07 Nov, 2007
1 commit
-
The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .textSigned-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
24 Oct, 2007
1 commit
-
In some places, the result of skb_headroom() is compared to an unsigned
integer, and in others, the result is compared to a signed integer. Make
the comparisons consistent and correct.Signed-off-by: Chuck Lever
Signed-off-by: David S. Miller
16 Oct, 2007
1 commit
-
Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
can generate tail calls for some of the netfilter hook okfn invocations,
so there is no need to inline the functions anymore. This caused huge
code bloat since we ended up with one inlined version and one out-of-line
version since we pass the address to nf_hook_slow.Before:
text data bss dec hex filename
8997385 1016524 524652 10538561 a0ce41 vmlinuxAfter:
text data bss dec hex filename
8994009 1016524 524652 10535185 a0c111 vmlinux
-------------------------------------------------------
-3376All cases have been verified to generate tail-calls with and without
netfilter. The okfns in ipmr and xfrm4_input still remain inline because
gcc can't generate tail-calls for them.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
11 Oct, 2007
2 commits
-
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
from new counters.Signed-off-by: David L Stevens
Signed-off-by: David S. Miller
14 Aug, 2007
1 commit
-
This patch cleans up duplicate includes in
net/ipv4/Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller