Commit 67469601406c12ced3db9956aeb0ef0854e2952f

Authored by Eric Dumazet
Committed by David S. Miller
1 parent a85c9bb895

ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing

Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Showing 4 changed files with 21 additions and 4 deletions Side-by-side Diff

include/net/inet_connection_sock.h
... ... @@ -45,6 +45,7 @@
45 45 struct dst_entry *dst);
46 46 struct inet_peer *(*get_peer)(struct sock *sk, bool *release_it);
47 47 u16 net_header_len;
  48 + u16 net_frag_header_len;
48 49 u16 sockaddr_len;
49 50 int (*setsockopt)(struct sock *sk, int level, int optname,
50 51 char __user *optval, unsigned int optlen);
... ... @@ -544,8 +544,8 @@
544 544  
545 545 extern void tcp_initialize_rcv_mss(struct sock *sk);
546 546  
547   -extern int tcp_mtu_to_mss(const struct sock *sk, int pmtu);
548   -extern int tcp_mss_to_mtu(const struct sock *sk, int mss);
  547 +extern int tcp_mtu_to_mss(struct sock *sk, int pmtu);
  548 +extern int tcp_mss_to_mtu(struct sock *sk, int mss);
549 549 extern void tcp_mtup_init(struct sock *sk);
550 550 extern void tcp_valid_rtt_meas(struct sock *sk, u32 seq_rtt);
551 551  
net/ipv4/tcp_output.c
... ... @@ -1150,7 +1150,7 @@
1150 1150 }
1151 1151  
1152 1152 /* Calculate MSS. Not accounting for SACKs here. */
1153   -int tcp_mtu_to_mss(const struct sock *sk, int pmtu)
  1153 +int tcp_mtu_to_mss(struct sock *sk, int pmtu)
1154 1154 {
1155 1155 const struct tcp_sock *tp = tcp_sk(sk);
1156 1156 const struct inet_connection_sock *icsk = inet_csk(sk);
... ... @@ -1161,6 +1161,14 @@
1161 1161 */
1162 1162 mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr);
1163 1163  
  1164 + /* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */
  1165 + if (icsk->icsk_af_ops->net_frag_header_len) {
  1166 + const struct dst_entry *dst = __sk_dst_get(sk);
  1167 +
  1168 + if (dst && dst_allfrag(dst))
  1169 + mss_now -= icsk->icsk_af_ops->net_frag_header_len;
  1170 + }
  1171 +
1164 1172 /* Clamp it (mss_clamp does not include tcp options) */
1165 1173 if (mss_now > tp->rx_opt.mss_clamp)
1166 1174 mss_now = tp->rx_opt.mss_clamp;
... ... @@ -1179,7 +1187,7 @@
1179 1187 }
1180 1188  
1181 1189 /* Inverse of above */
1182   -int tcp_mss_to_mtu(const struct sock *sk, int mss)
  1190 +int tcp_mss_to_mtu(struct sock *sk, int mss)
1183 1191 {
1184 1192 const struct tcp_sock *tp = tcp_sk(sk);
1185 1193 const struct inet_connection_sock *icsk = inet_csk(sk);
... ... @@ -1190,6 +1198,13 @@
1190 1198 icsk->icsk_ext_hdr_len +
1191 1199 icsk->icsk_af_ops->net_header_len;
1192 1200  
  1201 + /* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */
  1202 + if (icsk->icsk_af_ops->net_frag_header_len) {
  1203 + const struct dst_entry *dst = __sk_dst_get(sk);
  1204 +
  1205 + if (dst && dst_allfrag(dst))
  1206 + mtu += icsk->icsk_af_ops->net_frag_header_len;
  1207 + }
1193 1208 return mtu;
1194 1209 }
1195 1210  
... ... @@ -1778,6 +1778,7 @@
1778 1778 .syn_recv_sock = tcp_v6_syn_recv_sock,
1779 1779 .get_peer = tcp_v6_get_peer,
1780 1780 .net_header_len = sizeof(struct ipv6hdr),
  1781 + .net_frag_header_len = sizeof(struct frag_hdr),
1781 1782 .setsockopt = ipv6_setsockopt,
1782 1783 .getsockopt = ipv6_getsockopt,
1783 1784 .addr2sockaddr = inet6_csk_addr2sockaddr,