Eric Lee / smarc-fsl-linux-kernel

20 Dec, 2011

1 commit

3db1cd5c0 net: fix assignment of 0/1 to bool variables. ... Browse Code »

DaveM said:
Please, this kind of stuff rots forever and not using bool properly
drives me crazy.

Joe Perches gave me the spatch script:

@@
bool b;
@@
-b = 0
+b = false
@@
bool b;
@@
-b = 1
+b = true

I merely installed coccinelle, read the documentation and took credit.

Signed-off-by: Rusty Russell
Signed-off-by: David S. Miller

Rusty Russell
2011-12-20 11:27:29 +0800

13 Dec, 2011

1 commit

180d8cd94 foundations of per-cgroup memory pressure controlling. ... Browse Code »
43

This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800

12 Dec, 2011

1 commit

dfd56b8b3 net: use IS_ENABLED(CONFIG_IPV6) ... Browse Code »

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-12 07:25:16 +0800

25 Oct, 2011

1 commit

78d81d15b TCP: remove TCP_DEBUG ... Browse Code »

It was enabled by default and the messages guarded
by the define are useful.

Signed-off-by: Flavio Leitner
Signed-off-by: David S. Miller

Flavio Leitner
2011-10-25 05:36:08 +0800

21 Feb, 2011

1 commit

089c34827 tcp: Remove debug macro of TCP_CHECK_TIMER ... Browse Code »

Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.

Remove it to keep code clearer.

Signed-off-by: Shan Wei
Signed-off-by: David S. Miller

Shan Wei
2011-02-21 03:10:14 +0800

18 Oct, 2010

1 commit

c60ce4e26 tcp: use correct counters in CA_CWR state too ... Browse Code »

As CWR is stronger than CA_Disorder state, we can miscount
SACK/Reno failure into other timeouts. Not a bad problem as
it can happen only due to ECN, FRTO detecting spurious RTO
or xmit error which are the only callers of tcp_enter_cwr.
And even then losses and RTO must still follow thereafter
to actually end up into the relevant code paths.

Compile tested.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2010-10-18 04:46:33 +0800

05 Oct, 2010

1 commit

21a180cda Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
net/ipv4/Kconfig
net/ipv4/tcp_timer.c

David S. Miller
2010-10-05 02:56:38 +0800

29 Sep, 2010

1 commit

4d22f7d37 net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out() ... Browse Code »

Fixes kernel Bugzilla Bug 18952

This patch adds a syn_set parameter to the retransmits_timed_out()
routine and updates its callers. If not set, TCP_RTO_MIN is taken
as the calculation basis as before. If set, TCP_TIMEOUT_INIT is
used instead, so that sysctl_syn_retries represents the actual
amount of SYN retransmissions in case no SYNACKs are received when
establishing a new connection.

Signed-off-by: Damian Lukowski
Signed-off-by: David S. Miller

Damian Lukowski
2010-09-29 04:08:32 +0800

10 Sep, 2010

1 commit

e548833df Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
net/mac80211/main.c

David S. Miller
2010-09-10 13:27:33 +0800

31 Aug, 2010

1 commit

dca43c75e tcp: Add TCP_USER_TIMEOUT socket option. ... Browse Code »
43

This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If
0 is given, TCP will continue to use the system default.

Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.

The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.

The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.

This option, like many others, will be inherited by an acceptor from its
listener.

Signed-off-by: H.K. Jerry Chu
Signed-off-by: David S. Miller

Jerry Chu
2010-08-31 04:23:33 +0800

25 Aug, 2010

1 commit

ad1af0fed tcp: Combat per-cpu skew in orphan tests. ... Browse Code »

As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.

Reported-by: Anton Blanchard
Signed-off-by: David S. Miller
Acked-by: Eric Dumazet

David S. Miller
2010-08-25 17:27:49 +0800

13 Jul, 2010

1 commit

4bc2f18ba net/ipv4: EXPORT_SYMBOL cleanups ... Browse Code »

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 03:57:54 +0800

28 Apr, 2010

1 commit

6c37e5de4 TCP: avoid to send keepalive probes if receiving data ... Browse Code »

RFC 1122 says the following:
...
Keep-alive packets MUST only be sent when no data or
acknowledgement packets have been received for the
connection within an interval.
...

The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.

Signed-off-by: Flavio Leitner
Signed-off-by: Eric Dumazet
Acked-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Flavio Leitner
2010-04-28 03:53:25 +0800

13 Apr, 2010

1 commit

b6c6712a4 net: sk_dst_cache RCUification ... Browse Code »

With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.

sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)

This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))

This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-13 16:41:33 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

08 Mar, 2010

1 commit

318ae2edc Merge branch 'for-next' into for-linus ... Browse Code »

Conflicts:
Documentation/filesystems/proc.txt
arch/arm/mach-u300/include/mach/debug-macro.S
drivers/net/qlge/qlge_ethtool.c
drivers/net/qlge/qlge_main.c
drivers/net/typhoon.c

Jiri Kosina
2010-03-08 23:55:37 +0800

19 Feb, 2010

1 commit

36e31b0af net: TCP thin linear timeouts ... Browse Code »

This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.

Signed-off-by: Andreas Petlund
Signed-off-by: David S. Miller

Andreas Petlund
2010-02-19 07:43:08 +0800

09 Feb, 2010

1 commit

3ad2f3fbb tree-wide: Assorted spelling fixes ... Browse Code »

In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack
Cc: Joe Perches
Cc: Junio C Hamano
Signed-off-by: Jiri Kosina

Daniel Mack
2010-02-09 18:13:56 +0800

18 Jan, 2010

1 commit

72659ecce tcp: account SYN-ACK timeouts & retransmissions ... Browse Code »

Currently we don't increment SYN-ACK timeouts & retransmissions
although we do increment the same stats for SYN. We seem to have lost
the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
(commit 2248761e in the netdev-vger-cvs tree).

This patch fixes this issue. In the process we also rename the v4/v6
syn/ack retransmit functions for clarity. We also add a new
request_socket operations (syn_ack_timeout) so we can keep code in
inet_connection_sock.c protocol agnostic.

Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller

Octavian Purdila
2010-01-18 11:09:39 +0800

09 Dec, 2009

1 commit

2f7de5710 tcp: Stalling connections: Move timeout calculation routine ... Browse Code »

This patch moves retransmits_timed_out() from include/net/tcp.h
to tcp_timer.c, where it is used.

Reported-by: Frederic Leroy
Signed-off-by: Damian Lukowski
Acked-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Damian Lukowski
2009-12-09 12:56:11 +0800

21 Oct, 2009

1 commit

ea94ff3b5 net: Fix for dst_negative_advice ... Browse Code »

dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.

(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)

Signed-off-by: Krishna Kumar
Signed-off-by: David S. Miller

Krishna Kumar
2009-10-21 09:55:46 +0800

19 Oct, 2009

1 commit

c720c7e83 inet: rename some inet_sock fields ... Browse Code »

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-19 09:52:53 +0800

01 Sep, 2009

2 commits

6fa12c850 Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value. ... Browse Code »

RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
which may represent a number of allowed retransmissions or a timeout value.
Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
in number of allowed retransmissions.

For any desired threshold R2 (by means of time) one can specify tcp_retries2
(by means of number of retransmissions) such that TCP will not time out
earlier than R2. This is the case, because the RTO schedule follows a fixed
pattern, namely exponential backoff.

However, the RTO behaviour is not predictable any more if RTO backoffs can be
reverted, as it is the case in the draft
"Make TCP more Robust to Long Connectivity Disruptions"
(http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).

In the worst case TCP would time out a connection after 3.2 seconds, if the
initial RTO equaled MIN_RTO and each backoff has been reverted.

This patch introduces a function retransmits_timed_out(N),
which calculates the timeout of a TCP connection, assuming an initial
RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.

Whenever timeout decisions are made by comparing the retransmission counter
to some value N, this function can be used, instead.

The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
can occur than the value indicates. However, it yields a timeout which is
similar to the one of an unpatched, exponentially backing off TCP in the same
scenario. As no application could rely on an RTO greater than MIN_RTO, there
should be no risk of a regression.

Signed-off-by: Damian Lukowski
Acked-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Damian Lukowski
2009-09-01 17:45:47 +0800
f1ecd5d9e Revert Backoff [v3]: Revert RTO on ICMP destination unreachable ... Browse Code »

Here, an ICMP host/network unreachable message, whose payload fits to
TCP's SND.UNA, is taken as an indication that the RTO retransmission has
not been lost due to congestion, but because of a route failure
somewhere along the path.
With true congestion, a router won't trigger such a message and the
patched TCP will operate as standard TCP.

This patch reverts one RTO backoff, if an ICMP host/network unreachable
message, whose payload fits to TCP's SND.UNA, arrives.
Based on the new RTO, the retransmission timer is reset to reflect the
remaining time, or - if the revert clocked out the timer - a retransmission
is sent out immediately.
Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if
there have been retransmissions and reversible backoffs, already.

Changes from v2:
1) Renaming of skb in tcp_v4_err() moved to another patch.
2) Reintroduced tcp_bound_rto() and __tcp_set_rto().
3) Fixed code comments.

Signed-off-by: Damian Lukowski
Acked-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Damian Lukowski
2009-09-01 17:45:42 +0800

29 Aug, 2009

1 commit

df19a6267 tcp: keepalive cleanups ... Browse Code »

Introduce keepalive_probes(tp) helper, and use it, like
keepalive_time_when(tp) and keepalive_intvl_when(tp)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-08-29 14:48:54 +0800

02 Mar, 2009

1 commit

bc079e9ed tcp: cleanup ca_state mess in tcp_timer ... Browse Code »

Redundant checks made indentation impossible to follow.
However, it might be useful to make this ca_state+is_sack
indexed array.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2009-03-02 19:00:13 +0800

19 Dec, 2008

1 commit

6086ebca1 tcp: Stop scaring users with "treason uncloaked!" ... Browse Code »

The original message was unhelpful and extremely alarming to our poor
users, despite its charm. Make it less frightening.

Signed-off-by: Matt Mackall
Signed-off-by: David S. Miller

Matt Mackall
2008-12-19 14:27:42 +0800

26 Nov, 2008

1 commit

dd24c0019 net: Use a percpu_counter for orphan_count ... Browse Code »

Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-26 13:17:14 +0800

03 Nov, 2008

1 commit

fd3f8c4cb net: clean up net/ipv4/ip_fragment.c tcp_timer.c ip_input.c ... Browse Code »

Signed-off-by: Jianjun Kong
Signed-off-by: David S. Miller

Jianjun Kong
2008-11-03 18:47:38 +0800

31 Oct, 2008

1 commit

673d57e72 net: replace NIPQUAD() in net/ipv4/ net/ipv6/ ... Browse Code »

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller

Harvey Harrison
2008-10-31 15:53:57 +0800

30 Oct, 2008

1 commit

5b095d989 net: replace %p6 with %pI6 ... Browse Code »

Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller

Harvey Harrison
2008-10-30 03:52:50 +0800

29 Oct, 2008

1 commit

0c6ce78ab net: replace uses of NIP6_FMT with %p6 ... Browse Code »

Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller

Harvey Harrison
2008-10-29 14:02:31 +0800

08 Oct, 2008

1 commit

c57943a1c net: wrap sk->sk_backlog_rcv() ... Browse Code »

Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the
generic sk_backlog_rcv behaviour.

Signed-off-by: Peter Zijlstra
Signed-off-by: David S. Miller

Peter Zijlstra
2008-10-08 05:18:42 +0800

26 Jul, 2008

1 commit

547b792ca net: convert BUG_TRAP to generic WARN_ON ... Browse Code »

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2008-07-26 12:43:18 +0800

17 Jul, 2008

1 commit

de0744af1 mib: add net to NET_INC_STATS_BH ... Browse Code »

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-07-17 11:31:16 +0800

03 Jul, 2008

1 commit

40b215e59 tcp: de-bloat a bit with factoring NET_INC_STATS_BH out ... Browse Code »

There are some places in TCP that select one MIB index to
bump snmp statistics like this:

if ()
NET_INC_STATS_BH();
else if ()
NET_INC_STATS_BH();
...
else
NET_INC_STATS_BH();

or in a more tricky but still similar way.

On the other hand, this NET_INC_STATS_BH is a camouflaged
increment of percpu variable, which is not that small.

Factoring those cases out de-bloats 235 bytes on non-preemptible
i386 config and drives parts of the code into 80 columns.

add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-235 (-235)
function old new delta
tcp_fastretrans_alert 1437 1424 -13
tcp_dsack_set 137 124 -13
tcp_xmit_retransmit_queue 690 676 -14
tcp_try_undo_recovery 283 265 -18
tcp_sacktag_write_queue 1550 1515 -35
tcp_update_reordering 162 106 -56
tcp_retransmit_timer 990 904 -86

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-07-03 16:05:41 +0800

14 Jun, 2008

1 commit

4ae127d1b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/smc911x.c

David S. Miller
2008-06-14 11:52:39 +0800

13 Jun, 2008

1 commit

ec0a19662 tcp: Revert 'process defer accept as established' changes. ... Browse Code »

This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems. The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock. The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> goto death;
> }
>
>+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+ tcp_send_active_reset(sk, GFP_ATOMIC);
>+ goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller

David S. Miller
2008-06-13 07:34:35 +0800

12 Jun, 2008

1 commit

0b0408299 net: remove CVS keywords ... Browse Code »

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller

Adrian Bunk
2008-06-12 12:00:38 +0800

14 Apr, 2008

1 commit

569508c96 [TCP]: Format addresses appropriately in debug messages. ... Browse Code »

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki
2008-04-14 19:09:36 +0800