Eric Lee / smarc-fsl-linux-kernel

18 Jul, 2007

1 commit

16751347a [TCP]: remove unused argument to cong_avoid op ... Browse Code »

None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-07-18 16:46:58 +0800

31 May, 2007

1 commit

e4fd5da39 [TCP]: Consolidate checking for tcp orphan count being too big. ... Browse Code »

tcp_out_of_resources() and tcp_close() perform the
same checking of number of orphan sockets. Move this
code into common place.

Signed-off-by: Pavel Emelianov
Signed-off-by: David S. Miller

Pavel Emelianov
2007-05-31 16:23:34 +0800

03 May, 2007

1 commit

0ec96822d [TCP]: Use S+L catcher only with SACK for now ... Browse Code »

TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-05-03 18:30:34 +0800

30 Apr, 2007

2 commits

d551e4541 [TCP] FRTO: R FC4138 allows Nagle override when new data must be sent ... Browse Code »

This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.

Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-04-30 15:58:16 +0800
34588b4c0 [TCP]: Catch skb with S+L bugs earlier ... Browse Code »

SACKED_ACKED and LOST are mutually exclusive with SACK, thus
having their sum larger than packets_out is bug with SACK.
Eventually these bugs trigger traps in the tcp_clean_rtx_queue
with SACK but it's much more informative to do this here.

Non-SACK TCP, however, could get more than packets_out duplicate
ACKs which each increment sacked_out, so it makes sense to do
this kind of limitting for non-SACK TCP but not for SACK enabled
one. Perhaps the author had the opposite in mind but did the
logic accidently wrong way around? Anyway, the sacked_out
incrementer code for non-SACK already deals this issue before
calling sync_left_out so this trapping can be done
unconditionally.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-04-30 15:57:33 +0800

26 Apr, 2007

11 commits

164891aad [TCP]: Congestion control API update. ... Browse Code »

Do some simple changes to make congestion control API faster/cleaner.
* use ktime_t rather than timeval
* merge rtt sampling into existing ack callback
this means one indirect call versus two per ack.
* use flags bits to store options/settings

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-04-26 13:29:45 +0800
9e412ba76 [TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...) ... Browse Code »

This is (mostly) automated change using magic:

sed -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
-e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
-e 's|struct sock \*sk,[\n\t ]*struct tcp_sock \*tp$[^{]*\n{\n$|
struct sock \*sk\1\tstruct tcp_sock *tp = tcp_sk(sk);\n|g'
-e 's|struct sock \*sk, struct tcp_sock \*tp|
struct sock \*sk|g' -e 's|sk, tp$[^-]$|sk\1|g'

Fixed four unused variable (tp) warnings that were introduced.

In addition, manually added newlines after local variables and
tweaked function arguments positioning.

$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
...
$ codiff -fV built-in.o.old built-in.o.new
net/ipv4/route.c:
rt_cache_flush | +14
1 function changed, 14 bytes added

net/ipv4/tcp.c:
tcp_setsockopt | -5
tcp_sendpage | -25
tcp_sendmsg | -16
3 functions changed, 46 bytes removed

net/ipv4/tcp_input.c:
tcp_try_undo_recovery | +3
tcp_try_undo_dsack | +2
tcp_mark_head_lost | -12
tcp_ack | -15
tcp_event_data_recv | -32
tcp_rcv_state_process | -10
tcp_rcv_established | +1
7 functions changed, 6 bytes added, 69 bytes removed, diff: -63

net/ipv4/tcp_output.c:
update_send_head | -9
tcp_transmit_skb | +19
tcp_cwnd_validate | +1
tcp_write_wakeup | -17
__tcp_push_pending_frames | -25
tcp_push_one | -8
tcp_send_fin | -4
7 functions changed, 20 bytes added, 63 bytes removed, diff: -43

built-in.o.new:
18 functions changed, 40 bytes added, 178 bytes removed, diff: -138

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-04-26 13:29:34 +0800
4ac02bab7 [TCP]: Uninline tcp_done(). ... Browse Code »

The function is quite big and has several call sites and nothing
to collapse by compiler optimization on inlining.

Besides it's nicer to read in a in .c file.

Signed-off-by: Andi Kleen
Signed-off-by: David S. Miller

Andi Kleen
2007-04-26 13:29:25 +0800
604763722 [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY ... Browse Code »

When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
treat it as such in the stack.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-04-26 13:28:43 +0800
aa8223c7b [SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th ... Browse Code »

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2007-04-26 13:25:26 +0800
fe067e8ab [TCP]: Abstract out all write queue operations. ... Browse Code »

This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.

Signed-off-by: David S. Miller

David S. Miller
2007-04-26 13:24:02 +0800
9d729f72d [NET]: Convert xtime.tv_sec to get_seconds() ... Browse Code »

Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris
Signed-off-by: David S. Miller

James Morris
2007-04-26 13:23:32 +0800
3cfe3baaf [TCP]: Add two new spurious RTO responses to FRTO ... Browse Code »

New sysctl tcp_frto_response is added to select amongst these
responses:
- Rate halving based; reuses CA_CWR state (default)
- Very conservative; used to be the only one available (=1)
- Undo cwr; undoes ssthresh and cwnd reductions (=2)

The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-04-26 13:23:23 +0800
886236c12 [TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. ... Browse Code »

Signed-off-by: John Heffner
Signed-off-by: David S. Miller

John Heffner
2007-04-26 13:23:19 +0800
46d0de4ed [TCP] FRTO: Entry is allowed only during (New)Reno like recovery ... Browse Code »

This interpretation comes from RFC4138:
"If the sender implements some loss recovery algorithm other
than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
NOT be entered when earlier fast recovery is underway."

I think the RFC means to say (especially in the light of
Appendix B) that ...recovery is underway (not just fast recovery)
or was underway when it was interrupted by an earlier (F-)RTO
that hasn't yet been resolved (snd_una has not advanced enough).
Thus, my interpretation is that whenever TCP has ever
retransmitted other than head, basic version cannot be used
because then the order assumptions which are used as FRTO basis
do not hold.

NewReno has only the head segment retransmitted at a time.
Therefore, walk up to the segment that has not been SACKed, if
that segment is not retransmitted nor anything before it, we know
for sure, that nothing after the non-SACKed segment should be
either. This assumption is valid because TCPCB_EVER_RETRANS does
not leave holes but each non-SACKed segment is rexmitted
in-order.

Check for retrans_out > 1 avoids more expensive walk through the
skb list, as we can know the result beforehand: F-RTO will not be
allowed.

SACKed skb can turn into non-SACked only in the extremely rare
case of SACK reneging, in this case we might fail to detect
retransmissions if there were them for any other than head. To
get rid of that feature, whole rexmit queue would have to be
walked (always) or FRTO should be prevented when SACK reneging
happens. Of course RTO should still trigger after reneging which
makes this issue even less likely to show up. And as long as the
response is as conservative as it's now, nothing bad happens even
then.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-04-26 13:23:12 +0800
bdaae17da [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c ... Browse Code »

In addition, removed inline.

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2007-04-26 13:23:02 +0800

09 Feb, 2007

1 commit

ba7808eac [TCP]: remove tcp header from tcp_v4_check (take #2) ... Browse Code »

The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.

Signed-off-by: Frederik Deweerdt
Signed-off-by: David S. Miller

Frederik Deweerdt
2007-02-09 04:38:44 +0800

05 Jan, 2007

1 commit

0d630cc0a [TCP]: Use old definition of before ... Browse Code »

This reverts the new (unambiguous) definition of the TCP `before'
relation. As pointed out in an example by Herbert Xu, there is
existing code which implicitly requires the old definition in order
to work correctly.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2007-01-05 04:25:16 +0800

23 Dec, 2006

1 commit

9a036b9c3 [TCP]: Fix ambiguity in the `before' relation. ... Browse Code »

While looking at DCCP sequence numbers, I stumbled over a problem with
the following definition of before in tcp.h:

static inline int before(__u32 seq1, __u32 seq2)
{
return (__s32)(seq1-seq2) < 0;
}

Problem: This definition suffers from an an ambiguity, i.e. always

before(a, (a + 2^31) % 2^32)) = 1
before((a + 2^31) % 2^32), a) = 1

In text: when the difference between a and b amounts to 2^31,
a is always considered `before' b, the function can not decide.
The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31

Solution: There is a simple fix, by defining before in such a way that
0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
By not using the middle between 0 and 2^32, before can be made
unambiguous.
This is achieved by testing whether seq2-seq1 > 0 (using signed
32-bit arithmetic).

I attach a patch to codify this. Also the `after' relation is basically
a redefinition of `before', it is now defined as a macro after before.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2006-12-23 03:12:01 +0800

03 Dec, 2006

7 commits

8e5200f54 [NET]: Fix assorted misannotations (from md5 and udplite merges). ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: David S. Miller

Al Viro
2006-12-03 13:27:16 +0800
b51655b95 [NET]: Annotate __skb_checksum_complete() and friends. ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: David S. Miller

Al Viro
2006-12-03 13:23:38 +0800
6b11687ef [NET]: Annotate csum_tcpudp_magic() callers in net/* ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: David S. Miller

Al Viro
2006-12-03 13:23:29 +0800
cfb6eeb4c [TCP]: MD5 Signature Option (RFC2385) support. ... Browse Code »

Based on implementation by Rick Payne.

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki
2006-12-03 13:22:39 +0800
ce7bc3bf1 [TCP]: Restrict congestion control choices. ... Browse Code »

Allow normal users to only choose among a restricted set of congestion
control choices. The default is reno and what ever has been configured
as default. But the policy can be changed by administrator at any time.

For example, to allow any choice:
cp /proc/sys/net/ipv4/tcp_available_congestion_control \
/proc/sys/net/ipv4/tcp_allowed_congestion_control

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2006-12-03 13:21:49 +0800
3ff825b28 [TCP]: Add tcp_available_congestion_control sysctl. ... Browse Code »

Create /proc/sys/net/ipv4/tcp_available_congestion_control
that reflects currently available TCP choices.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2006-12-03 13:21:48 +0800
72a3effaf [NET]: Size listen hash tables using backlog hint ... Browse Code »

We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)

On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2006-12-03 13:21:44 +0800

03 Aug, 2006

1 commit

3687b1dc6 [TCP]: SNMPv2 tcpAttemptFails counter error ... Browse Code »

Refer to RFC2012, tcpAttemptFails is defined as following:
tcpAttemptFails OBJECT-TYPE
SYNTAX Counter32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The number of times TCP connections have made a direct
transition to the CLOSED state from either the SYN-SENT
state or the SYN-RCVD state, plus the number of times TCP
connections have made a direct transition to the LISTEN
state from the SYN-RCVD state."
::= { tcp 7 }

When I lookup into RFC793, I found that the state change should occured
under following condition:
1. SYN-SENT -> CLOSED
a) Received ACK,RST segment when SYN-SENT state.

2. SYN-RCVD -> CLOSED
b) Received SYN segment when SYN-RCVD state(came from LISTEN).
c) Received RST segment when SYN-RCVD state(came from SYN-SENT).
d) Received SYN segment when SYN-RCVD state(came from SYN-SENT).

3. SYN-RCVD -> LISTEN
e) Received RST segment when SYN-RCVD state(came from LISTEN).

In my test, those direct state transition can not be counted to
tcpAttemptFails.

Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2006-08-03 04:38:19 +0800

09 Jul, 2006

1 commit

a430a43d0 [NET] gso: Fix up GSO packets with broken checksums ... Browse Code »

Certain subsystems in the stack (e.g., netfilter) can break the partial
checksum on GSO packets. Until they're fixed, this patch allows this to
work by recomputing the partial checksums through the GSO mechanism.

Once they've all been converted to update the partial checksum instead of
clearing it, this workaround can be removed.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2006-07-09 04:34:56 +0800

01 Jul, 2006

1 commit

bcd761111 [NET]: Generalise TSO-specific bits from skb_setup_caps ... Browse Code »

This patch generalises the TSO-specific bits from sk_setup_caps by adding
the sk_gso_type member to struct sock. This makes sk_setup_caps generic
so that it can be used by TCPv6 or UFO.

The only catch is that whoever uses this must provide a GSO implementation
for their protocol which I think is a fair deal :) For now UFO continues to
live without a GSO implementation which is OK since it doesn't use the sock
caps field at the moment.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2006-07-01 05:12:08 +0800

30 Jun, 2006

1 commit

576a30eb6 [NET]: Added GSO header verification ... Browse Code »

When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag. The same method
can be used to implement TSO ECN support. We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2006-06-30 07:57:53 +0800

23 Jun, 2006

2 commits

f4c50d990 [NET]: Add software TSOv4 ... Browse Code »

This patch adds the GSO implementation for IPv4 TCP.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2006-06-23 17:07:33 +0800
7967168ce [NET]: Merge TSO/UFO fields in sk_buff ... Browse Code »

Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.

They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2006-06-23 17:07:29 +0800

21 Jun, 2006

1 commit

cee4cca74 Merge git://git.infradead.org/hdrcleanup-2.6 ... Browse Code »

* git://git.infradead.org/hdrcleanup-2.6: (63 commits)
[S390] __FD_foo definitions.
Switch to __s32 types in joystick.h instead of C99 types for consistency.
Add to headers included for userspace in
Move inclusion of out of user scope in asm-x86_64/mtrr.h
Remove struct fddi_statistics from user view in
Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
Include and use __uXX types in
Use __uXX types in , include too
Remove private struct dx_hash_info from public view in
Include and use __uXX types in
Use __uXX types in for struct divert_blk et al.
Use __u32 for elf_addr_t in , not u32. It's user-visible.
Remove PPP_FCS from user view in , remove __P mess entirely
Use __uXX types in user-visible structures in
Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
Use __uXX types for S390 DASD volume label definitions which are user-visible
S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
Remove unneeded inclusion of from
Fix private integer types used in V4L2 ioctls.
...

Manually resolve conflict in include/linux/mtd/physmap.h

Linus Torvalds
2006-06-21 06:10:08 +0800

18 Jun, 2006

5 commits

35089bb20 [TCP]: Add tcp_slow_start_after_idle sysctl. ... Browse Code »

A lot of people have asked for a way to disable tcp_cwnd_restart(),
and it seems reasonable to add a sysctl to do that.

Signed-off-by: David S. Miller

David S. Miller
2006-06-18 12:30:53 +0800
72dc5b922 [TCP]: Minimum congestion window consolidation. ... Browse Code »

Many of the TCP congestion methods all just use ssthresh
as the minimum congestion window on decrease. Rather than
duplicating the code, just have that be the default if that
handle in the ops structure is not set.

Minor behaviour change to TCP compound. It probably wants
to use this (ssthresh) as lower bound, rather than ssthresh/2
because the latter causes undershoot on loss.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2006-06-18 12:29:29 +0800
959378258 [I/OAT]: Add a sysctl for tuning the I/OAT offloaded I/O threshold ... Browse Code »

Any socket recv of less than this ammount will not be offloaded

Signed-off-by: Chris Leech
Signed-off-by: David S. Miller

Chris Leech
2006-06-18 12:25:54 +0800
0e4b4992b [I/OAT]: Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static ... Browse Code »

Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT

Signed-off-by: Chris Leech
Signed-off-by: David S. Miller

Chris Leech
2006-06-18 12:25:50 +0800
97fc2f084 [I/OAT]: Structure changes for TCP recv offload to I/OAT ... Browse Code »

Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.

Signed-off-by: Chris Leech
Signed-off-by: David S. Miller

Chris Leech
2006-06-18 12:25:48 +0800

26 Apr, 2006

1 commit

62c4f0a2d Don't include linux/config.h from anywhere else in include/ ... Browse Code »

Signed-off-by: David Woodhouse

David Woodhouse
2006-04-26 19:56:16 +0800

31 Mar, 2006

1 commit

0803dbed7 [TCP]: Kill unused extern decl for tcp_v4_hash_connecting() ... Browse Code »

Noticed by Alan Menegotto.

Signed-off-by: David S. Miller

David S. Miller
2006-03-31 18:25:46 +0800