Eric Lee / smarc-fsl-linux-kernel

16 Dec, 2009

1 commit

bb5b7c112 tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. ... Browse Code »

It creates a regression, triggering badness for SYN_RECV
sockets, for example:

[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
[19148.023496] MSR: 00029032 CR: 24002442 XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c

But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong. They try to use the
listening socket's 'dst' to probe the route settings. The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.

This stuff really isn't ready, so the best thing to do is a
full revert. This reverts the following commits:

f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d
022c3f7d82f0f1c68018696f2f027b87b9bb45c2
1aba721eba1d84a2defce45b950272cee1e6c72a
cda42ebd67ee5fdf09d7057b5a4584d36fe8a335
345cda2fd695534be5a4494f1b59da9daed33663
dc343475ed062e13fc260acccaab91d7d80fd5b2
05eaade2782fb0c90d3034fd7a7d5a16266182bb
6a2a2d6bf8581216e08be15fcb563cfd6c430e1e

Signed-off-by: David S. Miller

David S. Miller
2009-12-16 12:56:42 +0800

03 Dec, 2009

3 commits

4957faade TCPCT part 1g: Responder Cookie => Initiator ... Browse Code »

Parse incoming TCP_COOKIE option(s).

Calculate TCP_COOKIE option.

Send optional data.

This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

http://thread.gmane.org/gmane.linux.network/102586

Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1d: define TCP cookie option, extend existing struct's
TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1f: Initiator Cookie => Responder

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller

William Allen Simpson
2009-12-03 14:07:26 +0800
435cf559f TCPCT part 1d: define TCP cookie option, extend existing struct's ... Browse Code »

Data structures are carefully composed to require minimal additions.
For example, the struct tcp_options_received cookie_plus variable fits
between existing 16-bit and 8-bit variables, requiring no additional
space (taking alignment into consideration). There are no additions to
tcp_request_sock, and only 1 pointer in tcp_sock.

This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

http://thread.gmane.org/gmane.linux.network/102586

The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data. This is more flexible and
less subject to user configuration error. Such a cookie option has been
suggested for many years, and is also useful without SYN data, allowing
several related concepts to use the same extension option.

"Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html

"Re: what a new TCP header might look like", May 12, 1998.
ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail

These functions will also be used in subsequent patches that implement
additional features.

Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller

William Allen Simpson
2009-12-03 14:07:25 +0800
e6b4d1136 TCPCT part 1a: add request_values parameter for sending SYNACK ... Browse Code »

Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.

Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

William Allen Simpson
2009-12-03 14:07:23 +0800

22 Nov, 2009

1 commit

e994b7c90 tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL ... Browse Code »

That's extremely non-intuitive, noticed by William Allen Simpson.

And let's make the default be on, it's been suggested by a lot of
people so we'll give it a try.

Signed-off-by: David S. Miller

David S. Miller
2009-11-22 03:22:25 +0800

14 Nov, 2009

1 commit

bee7ca9ec net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED ... Browse Code »

Define two symbols needed in both kernel and user space.

Remove old (somewhat incorrect) kernel variant that wasn't used in
most cases. Default should apply to both RMSS and SMSS (RFC2581).

Replace numeric constants with defined symbols.

Stand-alone patch, originally developed for TCPCT.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

William Allen Simpson
2009-11-14 12:38:48 +0800

05 Nov, 2009

1 commit

05eaade27 tcp: Do not call IPv4 specific func in tcp_check_req ... Browse Code »

Calling IPv4 specific inet_csk_route_req in tcp_check_req
is a bad idea and crashes machine on IPv6 connections, as reported
by Valdis Kletnieks

Also, all we are really interested in is the timestamp
option in the header, so calling tcp_parse_options()
with the "estab" set to false flag is an overkill as
it tries to parse half a dozen other TCP options.

We know whether timestamp should be enabled or not
using data from request_sock.

Signed-off-by: Gilad Ben-Yossef
Tested-by: Valdis.Kletnieks@vt.edu
Signed-off-by: David S. Miller

Gilad Ben-Yossef
2009-11-05 15:24:14 +0800

29 Oct, 2009

2 commits

022c3f7d8 Allow tcp_parse_options to consult dst entry ... Browse Code »

We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings

Signed-off-by: Gilad Ben-Yossef
Sigend-off-by: Ori Finkelman
Sigend-off-by: Yony Amit
Signed-off-by: David S. Miller

Gilad Ben-Yossef
2009-10-29 16:28:41 +0800
f55017a93 Only parse time stamp TCP option in time wait sock ... Browse Code »

Since we only use tcp_parse_options here to check for the exietence
of TCP timestamp option in the header, it is better to call with
the "established" flag on.

Signed-off-by: Gilad Ben-Yossef
Signed-off-by: Ori Finkelman
Signed-off-by: Yony Amit
Signed-off-by: David S. Miller

Gilad Ben-Yossef
2009-10-29 16:28:39 +0800

20 Oct, 2009

2 commits

d1b99ba41 tcp: accept socket after TCP_DEFER_ACCEPT period ... Browse Code »

Willy Tarreau and many other folks in recent years
were concerned what happens when the TCP_DEFER_ACCEPT period
expires for clients which sent ACK packet. They prefer clients
that actively resend ACK on our SYN-ACK retransmissions to be
converted from open requests to sockets and queued to the
listener for accepting after the deferring period is finished.
Then application server can decide to wait longer for data
or to properly terminate the connection with FIN if read()
returns EAGAIN which is an indication for accepting after
the deferring period. This change still can have side effects
for applications that expect always to see data on the accepted
socket. Others can be prepared to work in both modes (with or
without TCP_DEFER_ACCEPT period) and their data processing can
ignore the read=EAGAIN notification and to allocate resources for
clients which proved to have no data to send during the deferring
period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
as a timeout) to wait for data will notice clients that didn't
send data for 3 seconds but that still resend ACKs.
Thanks to Willy Tarreau for the initial idea and to
Eric Dumazet for the review and testing the change.

Signed-off-by: Julian Anastasov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Julian Anastasov
2009-10-20 10:19:01 +0800
a1a2ad915 Revert "tcp: fix tcp_defer_accept to consider the timeout" ... Browse Code »

This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea.

Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
with a more correct way to deal with this.

Signed-off-by: David S. Miller

David S. Miller
2009-10-20 10:12:36 +0800

13 Oct, 2009

1 commit

6d01a026b tcp: fix tcp_defer_accept to consider the timeout ... Browse Code »

I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.

This behaviour contradicts the man page which says it should
wait only for some time :

TCP_DEFER_ACCEPT (since Linux 2.4)
Allows a listener to be awakened only when data arrives
on the socket. Takes an integer value (seconds), this
can bound the maximum number of attempts TCP will
make to complete the connection. This option should not
be used in code intended to be portable.

Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :

case TCP_DEFER_ACCEPT:
icsk->icsk_accept_queue.rskq_defer_accept = 0;
if (val > 0) {
/* Translate value in seconds to number of
* retransmits */
while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
val > ((TCP_TIMEOUT_INIT / HZ) <<
icsk->icsk_accept_queue.rskq_defer_accept))
icsk->icsk_accept_queue.rskq_defer_accept++;
icsk->icsk_accept_queue.rskq_defer_accept++;
}
break;

==> rskq_defer_accept is used as a counter of retransmits.

But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does.

Signed-off-by: Willy Tarreau
Signed-off-by: David S. Miller

Willy Tarreau
2009-10-13 16:35:28 +0800

16 Sep, 2009

1 commit

657e9649e tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG() ... Browse Code »

I have recently came across a preemption imbalance detected by:

huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
------------[ cut here ]------------
kernel BUG at /usr/src/linux/kernel/timer.c:664!
invalid opcode: 0000 [1] PREEMPT SMP

with ffffffff80644630 being inet_twdr_hangman().

This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
bit, so I looked at what might have caused it.

One thing that struck me as strange is tcp_twsk_destructor(), as it
calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
as far as I can tell.

Signed-off-by: Robert Varga
Signed-off-by: David S. Miller

Robert Varga
2009-09-16 14:49:21 +0800

15 Sep, 2009

1 commit

0b6a05c1d tcp: fix ssthresh u16 leftover ... Browse Code »

It was once upon time so that snd_sthresh was a 16-bit quantity.
...That has not been true for long period of time. I run across
some ancient compares which still seem to trust such legacy.
Put all that magic into a single place, I hopefully found all
of them.

Compile tested, though linking of allyesconfig is ridiculous
nowadays it seems.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2009-09-15 16:30:10 +0800

03 Sep, 2009

1 commit

aa1330766 tcp: replace hard coded GFP_KERNEL with sk_allocation ... Browse Code »

This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

mount_root => nfs_root_data => tcp_close => lock sk_lock =>
tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo
CC: David S. Miller
CC: Herbert Xu
Signed-off-by: Wu Fengguang
Signed-off-by: David S. Miller

Wu Fengguang
2009-09-03 14:45:45 +0800

29 Aug, 2009

1 commit

9a7030b76 tcp: Remove redundant copy of MD5 authentication key ... Browse Code »

Remove the copy of the MD5 authentication key from tcp_check_req().
This key has already been copied by tcp_v4_syn_recv_sock() or
tcp_v6_syn_recv_sock().

Signed-off-by: John Dykstra
Signed-off-by: David S. Miller

John Dykstra
2009-08-29 15:19:25 +0800

26 Jun, 2009

1 commit

1ac530b35 tcp: missing check ACK flag of received segment in FIN-WAIT-2 state ... Browse Code »

RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop
the segment and return[Page 72]. But this check is missing in function
tcp_timewait_state_process(). This cause the segment with FIN flag but
no ACK has two diffent action:

Case 1:
Node A Node B

(enter FIN-WAIT-2)
FIN -------------> discard
(move sk to tw list)

Case 2:
Node A Node B

(enter FIN-WAIT-2)
(move sk to tw list)
FIN ------------->

Signed-off-by: David S. Miller

Wei Yongjun
2009-06-26 11:03:15 +0800

16 Mar, 2009

1 commit

c887e6d2d tcp: consolidate paws check ... Browse Code »

Wow, it was quite tricky to merge that stream of negations
but I think I finally got it right:

check & replace_ts_recent:
(s32)(rcv_tsval - ts_recent) >= 0 => 0
(s32)(ts_recent - rcv_tsval) 0

discard:
(s32)(ts_recent - rcv_tsval) > TCP_PAWS_WINDOW => 1
(s32)(ts_recent - rcv_tsval) 0

I toggled the return values of tcp_paws_check around since
the old encoding added yet-another negation making tracking
of truth-values really complicated.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2009-03-16 11:09:52 +0800

03 Mar, 2009

1 commit

ee7537b63 tcp: tcp_init_wl / tcp_update_wl argument cleanup ... Browse Code »

The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.

Signed-off-by: Hantzis Fotis
Signed-off-by: David S. Miller

Hantzis Fotis
2009-03-03 14:42:02 +0800

02 Mar, 2009

1 commit

cabeccbd1 tcp: kill eff_sacks "cache", the sole user can calculate itself ... Browse Code »

Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2009-03-02 19:00:16 +0800

03 Nov, 2008

1 commit

5a5f3a8db net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c xfrm4_policy.c ... Browse Code »

Signed-off-by: Jianjun Kong
Signed-off-by: David S. Miller

Jianjun Kong
2008-11-03 16:24:34 +0800

08 Oct, 2008

1 commit

33f5f57ee tcp: kill pointless urg_mode ... Browse Code »

It all started from me noticing that this urgent check in
tcp_clean_rtx_queue is unnecessarily inside the loop. Then
I took a longer look to it and found out that the users of
urg_mode can trivially do without, well almost, there was
one gotcha.

Bonus: those funny people who use urg with >= 2^31 write_seq -
snd_una could now rejoice too (that's the only purpose for the
between being there, otherwise a simple compare would have done
the thing). Not that I assume that the rest of the tcp code
happily lives with such mind-boggling numbers :-). Alas, it
turned out to be impossible to set wmem to such numbers anyway,
yes I really tried a big sendfile after setting some wmem but
nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
So I hacked a bit variable to long and found out that it seems
to work... :-)

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2008-10-08 05:43:06 +0800

08 Aug, 2008

1 commit

2aaab9a0c tcp: (whitespace only) fix confusing indentation ... Browse Code »

The indentation in part of tcp_minisocks makes it look like one of the if
statements is much more important than it actually is.

Signed-off-by: Adam Langley
Signed-off-by: David S. Miller

Adam Langley
2008-08-08 11:27:45 +0800

07 Aug, 2008

1 commit

6edafaaf6 tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup ... Browse Code »

If the following packet flow happen, kernel will panic.
MathineA MathineB
SYN
---------------------->
SYN+ACK

When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[ 302.812793] IP: [] tcp_v4_md5_do_lookup+0x12/0x42
[ 302.817075] Oops: 0000 [#1] SMP
[ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[ 302.849946]
[ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[ 302.855184] EIP: 0060:[] EFLAGS: 00010296 CPU: 0
[ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400
[ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0
[ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634
[ 302.900140] Call Trace:
[ 302.902392] [] tcp_v4_reqsk_send_ack+0x17/0x35
[ 302.907060] [] tcp_check_req+0x156/0x372
[ 302.910082] [] printk+0x14/0x18
[ 302.912868] [] tcp_v4_do_rcv+0x1d3/0x2bf
[ 302.917423] [] tcp_v4_rcv+0x563/0x5b9
[ 302.920453] [] ip_local_deliver_finish+0xe8/0x183
[ 302.923865] [] ip_rcv_finish+0x286/0x2a3
[ 302.928569] [] dev_alloc_skb+0x11/0x25
[ 302.931563] [] netif_receive_skb+0x2d6/0x33a
[ 302.934914] [] pcnet32_poll+0x333/0x680 [pcnet32]
[ 302.938735] [] net_rx_action+0x5c/0xfe
[ 302.941792] [] __do_softirq+0x5d/0xc1
[ 302.944788] [] __do_softirq+0x0/0xc1
[ 302.948999] [] do_softirq+0x55/0x88
[ 302.951870] [] handle_fasteoi_irq+0x0/0xa4
[ 302.954986] [] irq_exit+0x35/0x69
[ 302.959081] [] do_IRQ+0x99/0xae
[ 302.961896] [] common_interrupt+0x23/0x28
[ 302.966279] [] default_idle+0x2a/0x3d
[ 302.969212] [] cpu_idle+0xb2/0xd2
[ 302.972169] =======================
[ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6
[ 303.011610] EIP: [] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Gui Jianfeng
Signed-off-by: David S. Miller

Gui Jianfeng
2008-08-07 14:50:04 +0800

17 Jul, 2008

2 commits

de0744af1 mib: add net to NET_INC_STATS_BH ... Browse Code »

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-07-17 11:31:16 +0800
63231bddf mib: add net to TCP_INC_STATS_BH ... Browse Code »

Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on
the stack.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-07-17 11:22:25 +0800

14 Jun, 2008

1 commit

4ae127d1b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/smc911x.c

David S. Miller
2008-06-14 11:52:39 +0800

13 Jun, 2008

1 commit

ec0a19662 tcp: Revert 'process defer accept as established' changes. ... Browse Code »

This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems. The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock. The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> goto death;
> }
>
>+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+ tcp_send_active_reset(sk, GFP_ATOMIC);
>+ goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller

David S. Miller
2008-06-13 07:34:35 +0800

12 Jun, 2008

1 commit

0b0408299 net: remove CVS keywords ... Browse Code »

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller

Adrian Bunk
2008-06-12 12:00:38 +0800

22 Mar, 2008

1 commit

ec3c0982a [TCP]: TCP_DEFER_ACCEPT updates - process as established ... Browse Code »

Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.

Benefits:
- established connection is now reset if it never makes it
to the accept queue

- diagnostic state of established matches with the packet traces
showing completed handshake

- TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
enforced with reasonable accuracy instead of rounding up to next
exponential back-off of syn-ack retry.

Signed-off-by: Patrick McManus
Signed-off-by: David S. Miller

Patrick McManus
2008-03-22 07:33:01 +0800

04 Mar, 2008

1 commit

c6aefafb7 [TCP]: Add IPv6 support to TCP SYN cookies ... Browse Code »

Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack. Just a two line change, but will
resend in it's entirety.

Signed-off-by: Glenn Griffin
Signed-off-by: YOSHIFUJI Hideaki

Glenn Griffin
2008-03-04 14:18:21 +0800

01 Mar, 2008

1 commit

fd80eb942 [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack. ... Browse Code »

It looks like dst parameter is used in this API due to historical
reasons. Actually, it is really used in the direct call to
tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.

Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
2008-03-01 03:43:03 +0800

11 Oct, 2007

3 commits

e60402d0a [TCP]: Move sack_ok access to obviously named funcs & cleanup ... Browse Code »

Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.

Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack => I dropped it

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-11 07:48:00 +0800
b5860bbac [TCP]: Tighten tcp_sock's belt, drop left_out ... Browse Code »

It is easily calculable when needed and user are not that many
after all.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-11 07:47:55 +0800
bdf1ee5d3 [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it ... Browse Code »

No other users exist for tcp_ecn.h. Very few things remain in
tcp.h, for most TCP ECN functions callers reside within a
single .c file and can be placed there.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-10-11 07:47:54 +0800

26 Apr, 2007

4 commits

aa8223c7b [SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th ... Browse Code »

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2007-04-26 13:25:26 +0800
2de979bd7 [TCP]: whitespace cleanup ... Browse Code »

Add whitespace around keywords.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-04-26 13:24:13 +0800
9d729f72d [NET]: Convert xtime.tv_sec to get_seconds() ... Browse Code »

Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris
Signed-off-by: David S. Miller

James Morris
2007-04-26 13:23:32 +0800
54287cc17 [TCP]: Keep copied_seq, rcv_wup and rcv_next together. ... Browse Code »

I noticed in oprofile study a cache miss in tcp_rcv_established() to read
copied_seq.

ffffffff80400a80 : /* tcp_rcv_established total: 4034293
2.0400 */

55493 0.0281 :ffffffff80400bc9: mov 0x4c8(%r12),%eax copied_seq
543103 0.2746 :ffffffff80400bd1: cmp 0x3e0(%r12),%eax rcv_nxt

if (tp->copied_seq == tp->rcv_nxt &&
len - tcp_header_len ucopy.len) {

In this function, the cache line 0x4c0 -> 0x500 is used only for this
reading 'copied_seq' field.

rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of
active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...)

As you suggested, I changed tcp_create_openreq_child() so that these fields
are changed together, to avoid adding a new store buffer stall.

Patch is 64bit friendly (no new hole because of alignment constraints)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2007-04-26 13:23:21 +0800

01 Mar, 2007

1 commit

a9948a7e1 [TCP]: Fix minisock tcp_create_openreq_child() typo. ... Browse Code »

On 2/28/07, KOVACS Krisztian wrote:
>
> Hi,
>
> While reading TCP minisock code I've found this suspiciously looking
> code fragment:
>
> - 8< -
> struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb)
> {
> struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);
>
> if (newsk != NULL) {
> const struct inet_request_sock *ireq = inet_rsk(req);
> struct tcp_request_sock *treq = tcp_rsk(req);
> struct inet_connection_sock *newicsk = inet_csk(sk);
> struct tcp_sock *newtp;
> - 8< -
>
> The above code initializes newicsk to inet_csk(sk), isn't that supposed
> to be inet_csk(newsk)? As far as I can tell this might leave
> icsk_ack.last_seg_size zero even if we do have received data.

Good catch!

David, please apply the attached patch.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2007-03-01 03:05:56 +0800