Eric Lee / smarc-fsl-linux-kernel

02 Oct, 2009

1 commit

914a9ab38 net: Use sk_mark for routing lookup in more places ... Browse Code »

This patch against v2.6.31 adds support for route lookup using sk_mark in some
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
lookup correctly if TCP sockets with SO_MARK were used.

Signed-off-by: Atis Elsts
Acked-by: Eric Dumazet

Atis Elsts
2009-10-02 06:16:49 +0800

15 Sep, 2009

1 commit

d7e9660ad Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...

Fixed up trivial conflicts:

- arch/x86/include/asm/socket.h

converted to in the x86 tree. The generic
header has the same new #define's, so that works out fine.

- drivers/net/tun.c

fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.

Noted in 'next' by Stephen Rothwell.

Linus Torvalds
2009-09-15 01:37:28 +0800

03 Sep, 2009

1 commit

6ce9e7b5f ip: Report qdisc packet drops ... Browse Code »

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s
...
IP:
1487048 outgoing packets dropped
...
Udp:
...
SndbufErrors: 1487048

send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

"The output queue for a network interface was full.
This generally indicates that the interface has stopped sending,
but may be caused by transient congestion.
(Normally, this does not occur in Linux. Packets are just silently
dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-09-03 09:05:33 +0800

28 Aug, 2009

1 commit

788d908f2 ipv4: make ip_append_data() handle NULL routing table ... Browse Code »

Add a check in ip_append_data() for NULL *rtp to prevent future bugs in
callers from being exploitable.

Signed-off-by: Julien Tinnes
Signed-off-by: Tavis Ormandy
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds

Julien TINNES
2009-08-28 03:23:43 +0800

12 Jul, 2009

1 commit

e51a67a9c net: ip_push_pending_frames() fix ... Browse Code »

After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.

I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().

Reported-by: Emil S Tantilov
Signed-off-by: Eric Dumazet
Tested-by: Emil S Tantilov
Signed-off-by: David S. Miller

Eric Dumazet
2009-07-12 11:26:21 +0800

11 Jun, 2009

1 commit

2b85a34e9 net: No more expensive sock_hold()/sock_put() on each tx ... Browse Code »

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-11 17:55:43 +0800

09 Jun, 2009

1 commit

d7fcf1a5c ipv4: Use frag list abstraction interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2009-06-09 15:19:37 +0800

03 Jun, 2009

2 commits

adf30907d net: skb->dst accessors ... Browse Code »

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-03 17:51:04 +0800
511c3f92a net: skb->rtable accessor ... Browse Code »

Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

Delete skb->rtable field

Setting rtable is not allowed, just set dst instead as rtable is an alias.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-03 17:51:02 +0800

27 Apr, 2009

1 commit

edf391ff1 snmp: add missing counters for RFC 4293 ... Browse Code »

The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols. This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.

Signed-off-by: Neil Horman
Signed-off-by: David S. Miller

Neil Horman
2009-04-27 17:45:02 +0800

16 Feb, 2009

1 commit

51f31cabe ip: support for TX timestamps on UDP and RAW sockets ... Browse Code »

Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.

Signed-off-by: Patrick Ohly
Signed-off-by: David S. Miller

Patrick Ohly
2009-02-16 14:43:38 +0800

25 Nov, 2008

2 commits

a21bba945 net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() ... Browse Code »

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-25 08:07:50 +0800
2e77d89b2 net: avoid a pair of dst_hold()/dst_release() in ip_append_data() ... Browse Code »

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-25 07:52:46 +0800

03 Nov, 2008

1 commit

d9319100c net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c ... Browse Code »

Signed-off-by: Jianjun Kong
Signed-off-by: David S. Miller

Jianjun Kong
2008-11-03 16:23:42 +0800

01 Oct, 2008

1 commit

86b08d867 ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible ... Browse Code »

Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.

Signed-off-by: KOVACS Krisztian
Signed-off-by: David S. Miller

KOVACS Krisztian
2008-10-01 22:44:42 +0800

26 Jul, 2008

1 commit

547b792ca net: convert BUG_TRAP to generic WARN_ON ... Browse Code »

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2008-07-26 12:43:18 +0800

17 Jul, 2008

1 commit

5e38e2704 mib: add net to IP_INC_STATS ... Browse Code »

All the callers already have either the net itself, or the place
where to get it from.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-07-17 11:19:49 +0800

15 Jul, 2008

1 commit

0388b0042 icmp: add struct net argument to icmp_out_count ... Browse Code »

This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-07-15 14:05:13 +0800

12 Jun, 2008

1 commit

0b0408299 net: remove CVS keywords ... Browse Code »

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller

Adrian Bunk
2008-06-12 12:00:38 +0800

30 Apr, 2008

1 commit

be9164e76 [IPv4] UFO: prevent generation of chained skb destined to UFO device ... Browse Code »

Problem: ip_append_data() could wrongly generate a chained skb for
devices which support UFO. When sk_write_queue is not empty
(e.g. MSG_MORE), __instead__ of appending data into the next nr_frag
of the queued skb, a new chained skb is created.

I would normally assume UFO device should get data in nr_frags and not
in frag_list. Later the udp4_hwcsum_outgoing() resets csum to NONE
and skb_gso_segment() has oops.

Proposal:
1. Even length is less than mtu, employ ip_ufo_append_data()
and append data to the __existed__ skb in the sk_write_queue.

2. ip_ufo_append_data() is fixed due to a wrong manipulation of
peek-ing and later enqueue-ing of the same skb. Now, enqueuing is
always performed, because on error the further
ip_flush_pending_frames() would release the queued skb.

Signed-off-by: Kostya B
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Kostya B
2008-04-30 13:36:30 +0800

26 Mar, 2008

1 commit

3b1e0a655 [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. ... Browse Code »

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-03-26 03:39:55 +0800

25 Mar, 2008

2 commits

c8cdaf998 [IPV4,IPV6]: Share cork.rt between IPv4 and IPv6. ... Browse Code »

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-03-25 09:23:59 +0800
cb84663e4 [NETNS]: Process IP layer in the context of the correct namespace. ... Browse Code »

Replace all the rest of the init_net with a proper net on the IP layer.

Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
2008-03-25 06:31:00 +0800

06 Mar, 2008

1 commit

ee6b96730 [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts ... Browse Code »

(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like :
union {
struct dst_entry *dst;
struct rtable *rtable;
};
permits to use skb->rtable in place.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-03-06 10:30:47 +0800

01 Feb, 2008

2 commits

4a19ec580 [NET]: Introducing socket mark socket option. ... Browse Code »

A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.

Signed-off-by: Laszlo Attila Toth
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller

Laszlo Attila Toth
2008-02-01 11:27:19 +0800
29ffe1a5c [INET]: Prevent out-of-sync truesize on ip_fragment slow path ... Browse Code »

When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length. This violates the constraints on truesize.

This patch postpones the update of skb->truesize to prevent this.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2008-02-01 11:27:07 +0800

29 Jan, 2008

6 commits

dde1bc0e6 [NETNS]: Add namespace for ICMP replying code. ... Browse Code »

All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.

Other protocols will follow.

Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
2008-01-29 07:11:13 +0800
f206351a5 [NETNS]: Add namespace parameter to ip_route_output_key. ... Browse Code »

Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
2008-01-29 07:11:07 +0800
f1b050bf7 [NETNS]: Add namespace parameter to ip_route_output_flow. ... Browse Code »

Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
2008-01-29 07:11:06 +0800
a067d9ac3 [NET]: Remove obsolete comment ... Browse Code »

It seems that ip_build_xmit is no longer used in here and
ip_append_data is used.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2008-01-29 07:00:45 +0800
6e23ae2a4 [NETFILTER]: Introduce NF_INET_ hook values ... Browse Code »

The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Patrick McHardy
2008-01-29 06:53:55 +0800
c439cb2e4 [IPV4]: Add ip_local_out ... Browse Code »

Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so. They also share the same output
function dst_output.

This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2008-01-29 06:53:47 +0800

23 Jan, 2008

2 commits

f945fa7ad [INET]: Fix truesize setting in ip_append_data ... Browse Code »

As it is ip_append_data only counts page fragments to the skb that
allocated it. As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.

This bug was exposed by the UDP accounting patch.

[ The wmem_alloc bumping needs to be moved with the truesize,
noticed by Takahiro Yasui. -DaveM ]

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2008-01-23 19:11:43 +0800
1e34a11d5 [IPV4]: Add missing skb->truesize increment in ip_append_page(). ... Browse Code »

And as noted by Takahiro Yasui, we thus need to bump the
sk->sk_wmem_alloc at this spot as well.

Signed-off-by: David S. Miller

David S. Miller
2008-01-23 19:11:40 +0800

07 Nov, 2007

1 commit

429f08e95 [IPV4]: Consolidate the ip cork destruction in ip_output.c ... Browse Code »

The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .text

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2007-11-07 20:08:25 +0800

24 Oct, 2007

1 commit

c2636b4d9 [NET]: Treat the sign of the result of skb_headroom() consistently ... Browse Code »

In some places, the result of skb_headroom() is compared to an unsigned
integer, and in others, the result is compared to a signed integer. Make
the comparisons consistent and correct.

Signed-off-by: Chuck Lever
Signed-off-by: David S. Miller

Chuck Lever
2007-10-24 12:27:55 +0800

16 Oct, 2007

1 commit

861d04860 [IPV4]: Uninline netfilter okfns ... Browse Code »

Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
can generate tail calls for some of the netfilter hook okfn invocations,
so there is no need to inline the functions anymore. This caused huge
code bloat since we ended up with one inlined version and one out-of-line
version since we pass the address to nf_hook_slow.

Before:
text data bss dec hex filename
8997385 1016524 524652 10538561 a0ce41 vmlinux

After:
text data bss dec hex filename
8994009 1016524 524652 10535185 a0c111 vmlinux
-------------------------------------------------------
-3376

All cases have been verified to generate tail-calls with and without
netfilter. The okfns in ipmr and xfrm4_input still remain inline because
gcc can't generate tail-calls for them.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2007-10-16 03:26:35 +0800

11 Oct, 2007

2 commits

3b04ddde0 [NET]: Move hardware header operations out of netdevice. ... Browse Code »

Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-10-11 07:52:52 +0800
96793b482 [IPV4]: Add ICMPMsgStats MIB (RFC 4293) ... Browse Code »

Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
from new counters.

Signed-off-by: David L Stevens
Signed-off-by: David S. Miller

David L Stevens
2007-10-11 07:51:28 +0800

14 Aug, 2007

1 commit

f49f9967b [IPV4]: Clean up duplicate includes in net/ipv4/ ... Browse Code »

This patch cleans up duplicate includes in
net/ipv4/

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Jesper Juhl
2007-08-14 13:52:02 +0800