02 Sep, 2016
1 commit
-
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
24 Apr, 2016
1 commit
-
nla_data() is now aligned on a 64-bit area.
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
09 Mar, 2016
1 commit
-
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.Signed-off-by: David S. Miller
24 Feb, 2016
1 commit
-
Currently it's converted into msecs, thus HZ=1000 intact.
Signed-off-by: Konstantin Khlebnikov
Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: David S. Miller
08 Feb, 2016
1 commit
-
Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller
29 Aug, 2015
3 commits
-
tcp_metrics and inetpeer both have functions to compare inetpeer
addresses. Consolidate into 1 version.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Use inetpeer set,get helpers in tcp_metrics rather than peeking into
the inetpeer_addr struct.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Refactors a common line into helper function.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
10 Jul, 2015
1 commit
-
Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Nandita Dukkipati
Signed-off-by: David S. Miller
08 Apr, 2015
1 commit
-
Fast Open has been using an experimental option with a magic number
(RFC6994). This patch makes the client by default use the RFC7413
option (34) to get and send Fast Open cookies. This patch makes
the client solicit cookies from a given server first with the
RFC7413 option. If that fails to elicit a cookie, then it tries
the RFC6994 experimental option. If that also fails, it uses the
RFC7413 option on all subsequent connect attempts. If the server
returns a Fast Open cookie then the client caches the form of the
option that successfully elicited a cookie, and uses that form on
later connects when it presents that cookie.The idea is to gradually obsolete the use of experimental options as
the servers and clients upgrade, while keeping the interoperability
meanwhile.Signed-off-by: Daniel Lee
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: David S. Miller
04 Apr, 2015
1 commit
-
The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.No changes detected by objdiff.
Signed-off-by: Ian Morris
Signed-off-by: David S. Miller
01 Apr, 2015
3 commits
-
Those are counterparts to nla_put_in_addr and nla_put_in6_addr.
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller -
IP addresses are often stored in netlink attributes. Add generic functions
to do that.For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller -
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and get rid
of the typecasting.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller
17 Mar, 2015
1 commit
-
Changes in tcp_metric hash table are protected by tcp_metrics_lock
only, not by genl_mutexWhile we are at it use deref_locked() instead of rcu_dereference()
in tcp_new() to avoid unnecessary barrier, as we hold tcp_metrics_lock
as well.Reported-by: Andrew Vagin
Signed-off-by: Eric Dumazet
Fixes: 098a697b497e ("tcp_metrics: Use a single hash table for all network namespaces.")
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
13 Mar, 2015
6 commits
-
Now that all of the operations are safe on a single hash table
accross network namespaces, allocate a single global hash table
and update the code to use it.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
Rewrite tcp_metrics_flush_all so that it can cope with entries from
different network namespaces on it's hash chain.This is based on the logic in tcp_metrics_nl_cmd_del for deleting
a selection of entries from a tcp metrics hash chain.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
tcp_metrics_flush_all always returns 0. Remove the unnecessary return code.
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
In preparation for using one tcp metrics hash table for all network
namespaces add a field tcpm_net to struct tcp_metrics_block, and
verify that field on all hash table lookups.Make the field tcpm_net of type possible_net_t so it takes no space
when network namespaces are disabled.Further add a function tm_net to read that field so we can be
efficient when network namespaces are disabled and concise
the rest of the time.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
In preparation for using one hash table for all network namespaces
mix the network namespace into the hash value.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
There is not a practical way to cleanup during boot so
just panic if there is a problem initializing tcp_metrics.That will at least give us a clear place to start debugging
if something does go wrong.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
18 Jan, 2015
1 commit
-
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.This makes the very common pattern of
if (genlmsg_end(...) < 0) { ... }
be a whole bunch of dead code. Many places also simply do
return nlmsg_end(...);
and the caller is expected to deal with it.
This also commonly (at least for me) causes errors, because it is very
common to writeif (my_function(...))
/* error condition */and if my_function() does "return nlmsg_end()" this is of course wrong.
Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller
15 Aug, 2014
1 commit
-
tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.In general, use of tcp_tw_recycle is disadvised.
Cc: Eric Dumazet
Cc: Florian Westphal
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
01 Aug, 2014
1 commit
-
commit d23ff7016 (tcp: add generic netlink support for tcp_metrics) introduced
netlink support for the new tcp_metrics, however it restricted getting of
tcp_metrics to root user only. This is a change from how these values could
have been fetched when in the old route cache. Unless there's a legitimate
reason to restrict the reading of these values it would be better if normal
users could fetch them.Cc: Julian Anastasov
Cc: linux-kernel@vger.kernel.orgSigned-off-by: Debabrata Banerjee
Signed-off-by: David S. Miller
05 Jun, 2014
1 commit
-
It is available since v3.15-rc5.
Cc: Pablo Neira Ayuso
Cc: "David S. Miller"
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
27 Feb, 2014
1 commit
-
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'TCP_CONG_RTT_STAMP is no longer needed.
As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.In this example ss command displays a srtt of 32 usecs (10Gbit link)
lpk51:~# ./ss -i dst lpk52
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
tcp ESTAB 0 1 10.246.11.51:42959
10.246.11.52:64614
cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559Updated iproute2 ip command displays :
lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51Old binary displays :
lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet
Acked-by: Neal Cardwell
Cc: Stephen Hemminger
Cc: Yuchung Cheng
Cc: Larry Brakmo
Cc: Julian Anastasov
Signed-off-by: David S. Miller
24 Jan, 2014
1 commit
-
A socket may be v6/v4-mapped. In that case sk->sk_family is AF_INET6,
but the IP being used is actually an IPv4-address.
Current's tcp-metrics will thus represent it as an IPv6-address:root@server:~# ip tcp_metrics
::ffff:10.1.1.2 age 22.920sec rtt 18750us rttvar 15000us cwnd 10
10.1.1.2 age 47.970sec rtt 16250us rttvar 10000us cwnd 10This patch modifies the tcp-metrics so that they are able to handle the
v6/v4-mapped sockets correctly.Signed-off-by: Christoph Paasch
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Jan, 2014
1 commit
-
In bbf852b96ebdc6d1 I introduced the tmlist, which allows to delete
multiple entries from the cache that match a specified destination if no
source-IP is specified.However, as the cache is an RCU-list, we should not create this tmlist, as
it will change the tcpm_next pointer of the element that will be deleted
and so a thread iterating over the cache's entries while holding the
RCU-lock might get "redirected" to this tmlist.This patch fixes this, by reverting back to the old behavior prior to
bbf852b96ebdc6d1, which means that we simply change the tcpm_next
pointer of the previous element (pp) to jump over the one we are
deleting.
The difference is that we call kfree_rcu() directly on the cache entry,
which allows us to delete multiple entries from the list.Fixes: bbf852b96ebdc6d1 (tcp: metrics: Delete all entries matching a certain destination)
Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller
18 Jan, 2014
2 commits
-
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
net/ipv4/tcp_metrics.cOverlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.Minor overlapping changes in bnx2x driver.
Signed-off-by: David S. Miller
-
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Jan, 2014
5 commits
-
We want to be able to get/del tcp-metrics based on the src IP. This
patch adds the necessary parsing of the netlink attribute and if the
source address is set, it will match on this one too.Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller -
As we now can have multiple entries per destination-IP, the "ip
tcp_metrics delete address ADDRESS" command deletes all of them.Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller -
This patch adds a new netlink attribute for the source-IP and appends it
to the netlink reply. Now, iproute2 can have access to the source-IP.Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller -
We add the source-address to the tcp-metrics, so that different metrics
will be used per source/destination-pair. We use the destination-hash to
store the metric inside the hash-table. That way, deleting and dumping
via "ip tcp_metrics" is easy.Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller -
As we will add also the source-address, we rename all accesses to the
tcp-metrics address to use "daddr".Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller
20 Nov, 2013
1 commit
-
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
15 Nov, 2013
2 commits
-
Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:@@
identifier ops;
@@
+const
struct genl_ops ops[] = {
...
};(except the struct thing in net/openvswitch/datapath.c)
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller -
We had some reports of crashes using TCP fastopen, and Dave Jones
gave a nice stack trace pointing to the error.Issue is that tcp_get_metrics() should not be called with a NULL dst
Fixes: 1fe4c481ba637 ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet
Reported-by: Dave Jones
Cc: Yuchung Cheng
Acked-by: Yuchung Cheng
Tested-by: Dave Jones
Signed-off-by: David S. Miller
30 Oct, 2013
1 commit
-
Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections. An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Oct, 2013
1 commit
-
TCP listener refactoring, part 5 :
We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.This patch includes the needed struct sock_common in front
of struct request_sockThis means there is no more inet6_request_sock IPv6 specific
structure.Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.loc_port -> ir_loc_port
loc_addr -> ir_loc_addr
rmt_addr -> ir_rmt_addr
rmt_port -> ir_rmt_port
iif -> ir_iifSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller