Eric Lee / smarc-fsl-linux-kernel

18 Mar, 2013

1 commit

1a2c6181c tcp: Remove TCPCT ... Browse Code »

TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.

As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:

Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.

before this patch:
average (among 7 runs) of 20845.5 Requests/Second
after:
average (among 7 runs) of 21403.6 Requests/Second

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2013-03-18 02:35:13 +0800

15 Dec, 2012

1 commit

e337e24d6 inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock ... Browse Code »

If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
hex dump (first 32 bytes):
0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................
02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x21/0x3e
[] kmem_cache_alloc+0xb5/0xc5
[] sk_prot_alloc.isra.53+0x2b/0xcd
[] sk_clone_lock+0x16/0x21e
[] inet_csk_clone_lock+0x10/0x7b
[] tcp_create_openreq_child+0x21/0x481
[] tcp_v4_syn_recv_sock+0x3a/0x23b
[] tcp_check_req+0x29f/0x416
[] tcp_v4_do_rcv+0x161/0x2bc
[] tcp_v4_rcv+0x6c9/0x701
[] ip_local_deliver_finish+0x70/0xc4
[] ip_local_deliver+0x4e/0x7f
[] ip_rcv_finish+0x1fc/0x233
[] ip_rcv+0x217/0x267
[] __netif_receive_skb+0x49e/0x553
[] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2012-12-15 02:14:07 +0800

24 Jul, 2012

1 commit

92101b3b2 ipv4: Prepare for change of rt->rt_iif encoding. ... Browse Code »
43

Use inet_iif() consistently, and for TCP record the input interface of
cached RX dst in inet sock.

rt->rt_iif is going to be encoded differently, so that we can
legitimately cache input routes in the FIB info more aggressively.

When the input interface is "use SKB device index" the rt->rt_iif will
be set to zero.

This forces us to move the TCP RX dst cache installation into the ipv4
specific code, and as well it should since doing the route caching for
ipv6 is pointless at the moment since it is not inspected in the ipv6
input paths yet.

Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
obsolete set to a non-zero value to force invocation of the check
callback.

Signed-off-by: David S. Miller

David S. Miller
2012-07-24 07:36:26 +0800

21 Jul, 2012

1 commit

ba3f7f04e ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-21 04:36:54 +0800

17 Jul, 2012

1 commit

6700c2709 net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() ... Browse Code »

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller

David S. Miller
2012-07-17 18:29:28 +0800

16 Jul, 2012

1 commit

80d0a69fc ipv4: Add helper inet_csk_update_pmtu(). ... Browse Code »

This abstracts away the call to dst_ops->update_pmtu() so that we can
transparently handle the fact that, in the future, the dst itself can
be invalidated by the PMTU update (when we have non-host routes cached
in sockets).

So we try to rebuild the socket cached route after the method
invocation if necessary.

This isn't used by SCTP because it needs to cache dsts per-transport,
and thus will need it's own local version of this helper.

Signed-off-by: David S. Miller

David S. Miller
2012-07-16 18:28:06 +0800

12 Jul, 2012

2 commits

1ed5c48f2 net: Remove checks for dst_ops->redirect being NULL. ... Browse Code »

No longer necessary.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:41:25 +0800
55be7a9c6 ipv4: Add redirect support to all protocol icmp error handlers. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 12:27:49 +0800

23 Jun, 2012

1 commit

7586eceb0 ipv4: tcp: dont cache output dst for syncookies ... Browse Code »

Don't cache output dst for syncookies, as this adds pressure on IP route
cache and rcu subsystem for no gain.

Signed-off-by: Eric Dumazet
Cc: Hans Schillstrom
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Eric Dumazet
2012-06-23 12:47:33 +0800

15 Apr, 2012

1 commit

c72e11833 inet: makes syn_ack_timeout mandatory ... Browse Code »

There are two struct request_sock_ops providers, tcp and dccp.

inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeout

Signed-off-by: Eric Dumazet
Cc: Gerrit Renker
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-15 03:24:26 +0800

04 Mar, 2012

1 commit

f541fb7e2 dccp: fix bug in sequence number validation during connection setup ... Browse Code »

This fixes a bug in the sequence number validation during the initial handshake.

The code did not treat the initial sequence numbers ISS and ISR as read-only and
did not keep state for GSR and GSS as required by the specification. This causes
problems with retransmissions during the initial handshake, causing the
budding connection to be reset.

This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required.

Signed-off-by: Samuel Jero
Signed-off-by: Gerrit Renker

Samuel Jero
2012-03-04 00:02:52 +0800

02 Dec, 2011

1 commit

898f73585 dccp: Evaluate ip_hdr() only once in dccp_v4_route_skb(). ... Browse Code »

This also works around a bogus gcc warning generated by an
upcoming patch from Eric Dumazet that rearranges the layout
of struct flowi4.

Signed-off-by: David S. Miller

David S. Miller
2011-12-02 02:28:34 +0800

22 Nov, 2011

1 commit

525c6465d dccp: fix error propagation in dccp_v4_connect ... Browse Code »

The errcode is not updated when ip_route_newports() fails.

Signed-off-by: RongQing.Li
Signed-off-by: David S. Miller

RongQing.Li
2011-11-22 05:45:26 +0800

04 Nov, 2011

1 commit

918eb3996 net: add missing bh_unlock_sock() calls ... Browse Code »

Simon Kirby reported lockdep warnings and following messages :

[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

Problem comes from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.

Another lock imbalance exists if __inet_inherit_port() returns an error
since commit 093d282321da ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.

Reported-by: Simon Kirby
Signed-off-by: Eric Dumazet
Tested-by: Eric Dumazet
CC: Balazs Scheidler
CC: KOVACS Krisztian
Reviewed-by: Thomas Gleixner
Tested-by: Simon Kirby
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-04 06:06:18 +0800

07 Aug, 2011

1 commit

6e5714eaf net: Compute protocol sequence numbers and fragment IDs using MD5. ... Browse Code »

Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky
Tested-by: Willy Tarreau
Signed-off-by: David S. Miller

David S. Miller
2011-08-07 09:33:19 +0800

19 May, 2011

1 commit

6bd023f3d ipv4: Make caller provide flowi4 key to inet_csk_route_req(). ... Browse Code »

This way the caller can get at the fully resolved fl4->{daddr,saddr}
etc.

Signed-off-by: David S. Miller

David S. Miller
2011-05-19 06:32:03 +0800

09 May, 2011

2 commits

0e7344199 ipv4: Use inet_csk_route_child_sock() in DCCP and TCP. ... Browse Code »
43

Operation order is now transposed, we first create the child
socket then we try to hook up the route.

Signed-off-by: David S. Miller

David S. Miller
2011-05-09 06:28:03 +0800
2c42758cf dccp: Use cork flow in dccp_v4_connect() ... Browse Code »

Since this is invoked from inet_stream_connect() the socket is locked
and therefore this usage is safe.

Signed-off-by: David S. Miller

David S. Miller
2011-05-09 04:18:53 +0800

04 May, 2011

1 commit

f1390160d dccp: Use flowi4->saddr in dccp_v4_connect() ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-05-04 11:06:41 +0800

29 Apr, 2011

2 commits

91ab0b60a ipv4: Get route daddr from flow key in dccp_v4_connect(). ... Browse Code »

Now that output route lookups update the flow with
destination address selection, we can fetch it from
fl4->daddr instead of rt->rt_dst

Signed-off-by: David S. Miller

David S. Miller
2011-04-29 14:49:30 +0800
f6d8bd051 inet: add RCU protection to inet->opt ... Browse Code »

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet
Cc: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-29 04:16:35 +0800

28 Apr, 2011

1 commit

2d7192d6c ipv4: Sanitize and simplify ip_route_{connect,newports}() ... Browse Code »

These functions are used together as a unit for route resolution
during connect(). They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.

It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.

Let the callers provide the on-stack flow object. That way we only
need to initialize it once in the ip_route_connect() call.

Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.

Also, describe why this set of facilities are needed and how it works
in a big comment.

Signed-off-by: David S. Miller
Reviewed-by: Eric Dumazet

David S. Miller
2011-04-28 04:59:04 +0800

13 Mar, 2011

4 commits

9cce96df5 net: Put fl4_* macros to struct flowi4 and use them again. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:54 +0800
9d6ec9380 ipv4: Use flowi4 in public route lookup interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:48 +0800
6281dcc94 net: Make flowi ports AF dependent. ... Browse Code »

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:46 +0800
1d28f42c1 net: Put flowi_* prefix on AF independent members of struct flowi ... Browse Code »

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:44 +0800

03 Mar, 2011

1 commit

b23dd4fe4 ipv4: Make output route lookup return rtable directly. ... Browse Code »

Instead of on the stack.

Signed-off-by: David S. Miller

David S. Miller
2011-03-03 06:31:35 +0800

02 Mar, 2011

3 commits

273447b35 ipv4: Kill can_sleep arg to ip_route_output_flow() ... Browse Code »

This boolean state is now available in the flow flags.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:27:04 +0800
420d44daa ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" ... Browse Code »

Since that is what the current vague "flags" argument means.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:19:23 +0800
abdf7e723 ipv4: Can final ip_route_connect() arg to boolean "can_sleep". ... Browse Code »

Since that's what the current vague "flags" thing means.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:15:24 +0800

25 Feb, 2011

1 commit

dca8b089c ipv4: Rearrange how ip_route_newports() gets port keys. ... Browse Code »

ip_route_newports() is the only place in the entire kernel that
cares about the port members in the routing cache entry's lookup
flow key.

Therefore the only reason we store an entire flow inside of the
struct rtentry is for this one special case.

Rewrite ip_route_newports() such that:

1) The caller passes in the original port values, so we don't need
to use the rth->fl.fl_ip_{s,d}port values to remember them.

2) The lookup flow is constructed by hand instead of being copied
from the routing cache entry's flow.

Signed-off-by: David S. Miller

David S. Miller
2011-02-25 05:38:12 +0800

18 Nov, 2010

1 commit

5811662b1 net: use the macros defined for the members of flowi ... Browse Code »

Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-11-18 04:27:45 +0800

21 Oct, 2010

1 commit

093d28232 tproxy: fix hash locking issue when using port redirection in __inet_inherit_port() ... Browse Code »

When __inet_inherit_port() is called on a tproxy connection the wrong locks are
held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
implicit assumption that the listener's port number (and thus its bind bucket).
Unfortunately, if you're using the TPROXY target to redirect skbs to a
transparent proxy that assumption is not true anymore and things break.

This patch adds code to __inet_inherit_port() so that it can handle this case
by looking up or creating a new bind bucket for the child socket and updates
callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
failing.

Reported by and original patch from Stephen Buck .
See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion.

Signed-off-by: KOVACS Krisztian
Signed-off-by: Patrick McHardy

Balazs Scheidler
2010-10-21 19:06:43 +0800

11 Jun, 2010

1 commit

d8d1f30b9 net-next: remove useless union keyword ... Browse Code »

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Changli Gao
2010-06-11 14:31:35 +0800

12 Apr, 2010

1 commit

bb2962461 inet: Remove unused send_check length argument ... Browse Code »

inet: Remove unused send_check length argument

This patch removes the unused length argument from the send_check
function in struct inet_connection_sock_af_ops.

Signed-off-by: Herbert Xu
Tested-by: Yinghai
Signed-off-by: David S. Miller

Herbert Xu
2010-04-12 06:29:09 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

16 Mar, 2010

1 commit

d14a0ebda net-2.6 [Bug-Fix][dccp]: fix oops caused after failed initialisation ... Browse Code »

dccp: fix panic caused by failed initialisation

This fixes a kernel panic reported thanks to Andre Noll:

if DCCP is compiled into the kernel and any out of the initialisation
steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
from creating DCCP sockets.

This patch fixes the problem by propagating a failure in dccp_init() to
dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
DCCP protocol is not made available if its initialisation fails.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2010-03-16 07:00:50 +0800

18 Jan, 2010

1 commit

2c8c1e729 net: spread __net_init, __net_exit ... Browse Code »

__net_init/__net_exit are apparently not going away, so use them
to full extent.

In some cases __net_init was removed, because it was called from
__net_exit code.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2010-01-18 11:16:02 +0800

09 Dec, 2009

1 commit

9327f7053 tcp: Fix a connect() race with timewait sockets ... Browse Code »

First patch changes __inet_hash_nolisten() and __inet6_hash()
to get a timewait parameter to be able to unhash it from ehash
at same time the new socket is inserted in hash.

This makes sure timewait socket wont be found by a concurrent
writer in __inet_check_established()

Reported-by: kapil dakhane
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-12-09 12:17:51 +0800

03 Dec, 2009

1 commit

e6b4d1136 TCPCT part 1a: add request_values parameter for sending SYNACK ... Browse Code »

Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.

Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

William Allen Simpson
2009-12-03 14:07:23 +0800