Commit b49960a05e32121d29316cfdf653894b88ac9190

Authored by Eric Dumazet
Committed by David S. Miller
1 parent 84768edbb2

tcp: change tcp_adv_win_scale and tcp_rmem[2]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Showing 3 changed files with 8 additions and 7 deletions Side-by-side Diff

Documentation/networking/ip-sysctl.txt
... ... @@ -147,7 +147,7 @@
147 147 (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
148 148 if it is <= 0.
149 149 Possible values are [-31, 31], inclusive.
150   - Default: 2
  150 + Default: 1
151 151  
152 152 tcp_allowed_congestion_control - STRING
153 153 Show/set the congestion control choices available to non-privileged
... ... @@ -410,7 +410,7 @@
410 410 net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
411 411 automatic tuning of that socket's receive buffer size, in which
412 412 case this value is ignored.
413   - Default: between 87380B and 4MB, depending on RAM size.
  413 + Default: between 87380B and 6MB, depending on RAM size.
414 414  
415 415 tcp_sack - BOOLEAN
416 416 Enable select acknowledgments (SACKS).
... ... @@ -3243,7 +3243,7 @@
3243 3243 {
3244 3244 struct sk_buff *skb = NULL;
3245 3245 unsigned long limit;
3246   - int max_share, cnt;
  3246 + int max_rshare, max_wshare, cnt;
3247 3247 unsigned int i;
3248 3248 unsigned long jiffy = jiffies;
3249 3249  
3250 3250  
3251 3251  
... ... @@ -3303,15 +3303,16 @@
3303 3303 tcp_init_mem(&init_net);
3304 3304 /* Set per-socket limits to no more than 1/128 the pressure threshold */
3305 3305 limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
3306   - max_share = min(4UL*1024*1024, limit);
  3306 + max_wshare = min(4UL*1024*1024, limit);
  3307 + max_rshare = min(6UL*1024*1024, limit);
3307 3308  
3308 3309 sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
3309 3310 sysctl_tcp_wmem[1] = 16*1024;
3310   - sysctl_tcp_wmem[2] = max(64*1024, max_share);
  3311 + sysctl_tcp_wmem[2] = max(64*1024, max_wshare);
3311 3312  
3312 3313 sysctl_tcp_rmem[0] = SK_MEM_QUANTUM;
3313 3314 sysctl_tcp_rmem[1] = 87380;
3314   - sysctl_tcp_rmem[2] = max(87380, max_share);
  3315 + sysctl_tcp_rmem[2] = max(87380, max_rshare);
3315 3316  
3316 3317 pr_info("Hash tables configured (established %u bind %u)\n",
3317 3318 tcp_hashinfo.ehash_mask + 1, tcp_hashinfo.bhash_size);
net/ipv4/tcp_input.c
... ... @@ -85,7 +85,7 @@
85 85 EXPORT_SYMBOL(sysctl_tcp_ecn);
86 86 int sysctl_tcp_dsack __read_mostly = 1;
87 87 int sysctl_tcp_app_win __read_mostly = 31;
88   -int sysctl_tcp_adv_win_scale __read_mostly = 2;
  88 +int sysctl_tcp_adv_win_scale __read_mostly = 1;
89 89 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
90 90  
91 91 int sysctl_tcp_stdurg __read_mostly;