18 Mar, 2013
1 commit
-
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.before this patch:
average (among 7 runs) of 20845.5 Requests/Second
after:
average (among 7 runs) of 21403.6 Requests/SecondSigned-off-by: Christoph Paasch
Signed-off-by: David S. Miller
07 Jan, 2013
1 commit
-
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
namespace aware. The reason behind this patch is to ease the testing
of ecn problems on the internet and allows applications to tune their
own use of ecn.Cc: Eric Dumazet
Cc: David Miller
Cc: Stephen Hemminger
Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
04 Nov, 2012
1 commit
-
For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.Decouple req->retrans field into two independent fields :
num_retrans : number of retransmit
num_timeout : number of timeoutsnum_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.Reported-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Cc: Julian Anastasov
Cc: Vijay Subramanian
Cc: Elliott Hughes
Cc: Neal Cardwell
Signed-off-by: David S. Miller
01 Sep, 2012
1 commit
-
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes5. supporting TCP_FASTOPEN socket option
6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock7. supporting TCP's TFO cookie option
8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.Signed-off-by: H.K. Jerry Chu
Cc: Yuchung Cheng
Cc: Neal Cardwell
Cc: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller
20 Jul, 2012
1 commit
-
This patch impelements the common code for both the client and server.
1. TCP Fast Open option processing. Since Fast Open does not have an
option number assigned by IANA yet, it shares the experiment option
code 254 by implementing draft-ietf-tcpm-experimental-options
with a 16 bits magic number 0xF989. This enables global experiments
without clashing the scarce(2) experimental options available for TCP.When the draft status becomes standard (maybe), the client should
switch to the new option number assigned while the server supports
both numbers for transistion.2. The new sysctl tcp_fastopen
3. A place holder init function
Signed-off-by: Yuchung Cheng
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Jul, 2012
1 commit
-
remove redundant declarations, they belong in include/net/tcp.h
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Nov, 2011
1 commit
-
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
21 Oct, 2011
1 commit
-
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Aug, 2011
1 commit
-
Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use
of ecn_ok.This can happen if cookie_check_timestamp() returns due to not having
seen a timestamp. Defaulting to ecn off seems like a reasonable thing
to do in this case, so initialized ecn_ok to false.Signed-off-by: Mike Waychison
Signed-off-by: David S. Miller
09 Jun, 2011
1 commit
-
This patch lowers the default initRTO from 3secs to 1sec per
RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet
has been retransmitted, AND the TCP timestamp option is not on.It also adds support to take RTT sample during 3WHS on the passive
open side, just like its active open counterpart, and uses it, if
valid, to seed the initRTO for the data transmission phase.The patch also resets ssthresh to its initial default at the
beginning of the data transmission phase, and reduces cwnd to 1 if
there has been MORE THAN ONE retransmission during 3WHS per RFC5681.Signed-off-by: H.K. Jerry Chu
Signed-off-by: David S. Miller
23 Apr, 2011
1 commit
-
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Mar, 2011
4 commits
-
Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
-
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*This will let us to create AF optimal flow instances.
It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.Signed-off-by: David S. Miller
-
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.This is the first step to move in that direction.
Signed-off-by: David S. Miller
02 Mar, 2011
1 commit
-
Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
handles unconnected UDP datagram sockets.Signed-off-by: David S. Miller
27 Jun, 2010
2 commits
-
Allows use of ECN when syncookies are in effect by encoding ecn_ok
into the syn-ack tcp timestamp.While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
removes the "if (0)".Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller -
As pointed out by Fernando Gont there is no need to encode rcv_wscale
into the cookie.We did not use the restored rcv_wscale anyway; it is recomputed
via tcp_select_initial_window().Thus we can save 4 bits in the ts option space by removing rcv_wscale.
In case window scaling was not supported, we set the (invalid) wscale
value 0xf.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
17 Jun, 2010
1 commit
-
Discard the ACK if we find options that do not match current sysctl
settings.Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
05 Jun, 2010
2 commits
-
- ipv6 msstab: account for ipv6 header size
- ipv4 msstab: add mss for Jumbograms.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller -
caller: if (!th->rst && !th->syn && th->ack)
callee: if (!th->ack)make the caller only check for !syn (common for 3whs), and move
the !rst / ack test to the callee.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
02 Jun, 2010
1 commit
-
There are more than a dozen occurrences of following code in the
IPv6 stack:if (opt && opt->srcrt) {
struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
ipv6_addr_copy(&final, &fl.fl6_dst);
ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
final_p = &final;
}Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.Signed-off-by: Arnaud Ebalard
Signed-off-by: David S. Miller
24 Dec, 2009
1 commit
-
Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.Signed-off-by: Laurent Chavey
Signed-off-by: David S. Miller
16 Dec, 2009
1 commit
-
It creates a regression, triggering badness for SYN_RECV
sockets, for example:[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
[19148.023496] MSR: 00029032 CR: 24002442 XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.cBut even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong. They try to use the
listening socket's 'dst' to probe the route settings. The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.This stuff really isn't ready, so the best thing to do is a
full revert. This reverts the following commits:f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d
022c3f7d82f0f1c68018696f2f027b87b9bb45c2
1aba721eba1d84a2defce45b950272cee1e6c72a
cda42ebd67ee5fdf09d7057b5a4584d36fe8a335
345cda2fd695534be5a4494f1b59da9daed33663
dc343475ed062e13fc260acccaab91d7d80fd5b2
05eaade2782fb0c90d3034fd7a7d5a16266182bb
6a2a2d6bf8581216e08be15fcb563cfd6c430e1eSigned-off-by: David S. Miller
03 Dec, 2009
1 commit
-
Parse incoming TCP_COOKIE option(s).
Calculate TCP_COOKIE option.
Send optional data.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):http://thread.gmane.org/gmane.linux.network/102586
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1d: define TCP cookie option, extend existing struct's
TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1f: Initiator Cookie => ResponderSigned-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller
29 Oct, 2009
1 commit
-
We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settingsSigned-off-by: Gilad Ben-Yossef
Sigend-off-by: Ori Finkelman
Sigend-off-by: Yony Amit
Signed-off-by: David S. Miller
19 Oct, 2009
1 commit
-
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
07 Oct, 2009
1 commit
-
Atis Elsts wrote:
> Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.Ok, I'll just drop that part, I'm not sure what should be done in this case.
> Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?
I just sent that patch out too quickly, here's a better one with the updates.
Add support for IPv6 route lookups using sk_mark.
Signed-off-by: Brian Haley
Signed-off-by: David S. Miller
24 Jun, 2009
2 commits
-
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.* as,cfq: rename ioc_count uniquely
* cpufreq: rename cpu_dbs_info uniquely
* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it
* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it* ipv4,6: rename cookie_scratch uniquely
* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry* perf_counter: rename disable_count to perf_disable_count
* ftrace: rename test_event_disable to ftrace_test_event_disable
* kmemleak: rename test_pointer to kmemleak_test_pointer
* mce: rename next_interval to mce_next_interval
[ Impact: percpu usage cleanups, no duplicate static percpu var names ]
Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter
Cc: Ivan Kokshaysky
Cc: Jens Axboe
Cc: Dave Jones
Cc: Jeremy Fitzhardinge
Cc: linux-mm
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Li Zefan
Cc: Catalin Marinas
Cc: Andi Kleen -
Currently, the following three different ways to define percpu arrays
are in use.1. DEFINE_PER_CPU(elem_type[array_len], array_name);
2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
3. DEFINE_PER_CPU(elem_type, array_name)[array_len];Unify to #1 which correctly separates the roles of the two parameters
and thus allows more flexibility in the way percpu variables are
defined.[ Impact: cleanup ]
Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter
Cc: Ingo Molnar
Cc: Tony Luck
Cc: Benjamin Herrenschmidt
Cc: Thomas Gleixner
Cc: Jeremy Fitzhardinge
Cc: linux-mm@kvack.org
Cc: Christoph Lameter
Cc: David S. Miller
20 Apr, 2009
1 commit
-
last_synq_overflow eats 4 or 8 bytes in struct tcp_sock, even
though it is only used when a listening sockets syn queue
is full.We can (ab)use rx_opt.ts_recent_stamp to store the same information;
it is not used otherwise as long as a socket is in listen state.Move linger2 around to avoid splitting struct mtu_probe
across cacheline boundary on 32 bit arches.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
26 Nov, 2008
1 commit
-
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.Take it from socket or netdevice. Stub DECnet to init_net.
Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
20 Oct, 2008
1 commit
-
'tcp: Port redirection support for TCP' (a3116ac5c) added a new member
to inet_request_sock() which inet_csk_clone() makes use of but failed
to add proper initialization to the IPv6 syncookie code and missed a
couple of places where the new member should be used instead of
inet_sk(sk)->sport.Signed-off-by: KOVACS Krisztian
Signed-off-by: David S. Miller
04 Aug, 2008
1 commit
-
cookie_v6_check() did not call reqsk_free() if xfrm_lookup() fails,
leaking the request sock.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
26 Jul, 2008
1 commit
-
ecn_ok is not initialized when a connection is established by cookies.
The cookie syn-ack never sets ECN, so ecn_ok must be set to 0.Spotted using ns-3/network simulation cradle simulator and valgrind.
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
17 Jul, 2008
1 commit
-
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
11 Jun, 2008
1 commit
-
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller
10 Apr, 2008
1 commit
-
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.changes since v1:
don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
and have ipv6/syncookies.c use it.
Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()Reviewed-by: Hagen Paul Pfeifer
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
24 Mar, 2008
1 commit
-
the first u32 copied from syncookie_secret is overwritten by the
minute-counter four lines below. After adjusting the destination
address, the size of syncookie_secret can be reduced accordingly.AFAICS, the only other user of syncookie_secret[] is the ipv6
syncookie support. Because ipv6 syncookies only grab 44 bytes from
syncookie_secret[], this shouldn't affect them in any way.With fixes from Glenn Griffin.
Signed-off-by: Florian Westphal
Acked-by: Glenn Griffin
Signed-off-by: David S. Miller
04 Mar, 2008
1 commit
-
Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack. Just a two line change, but will
resend in it's entirety.Signed-off-by: Glenn Griffin
Signed-off-by: YOSHIFUJI Hideaki