Eric Lee / smarc-fsl-linux-kernel

02 Sep, 2016

1 commit

4f70c96ff tcp: make nla_policy const ... Browse Code »

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
2016-09-02 05:09:01 +0800

24 Apr, 2016

1 commit

2175d87cc libnl: nla_put_msecs(): align on a 64-bit area ... Browse Code »

nla_data() is now aligned on a 64-bit area.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2016-04-24 08:13:24 +0800

09 Mar, 2016

1 commit

810813c47 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2016-03-09 01:34:12 +0800

24 Feb, 2016

1 commit

9bdfb3b79 tcp: convert cached rtt from usec to jiffies when feeding initial rto ... Browse Code »

Currently it's converted into msecs, thus HZ=1000 intact.

Signed-off-by: Konstantin Khlebnikov
Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: David S. Miller

Konstantin Khlebnikov
2016-02-24 07:28:46 +0800

08 Feb, 2016

1 commit

1043e25ff ipv4: Namespaceify tcp reordering sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
2016-02-08 03:35:10 +0800

29 Aug, 2015

3 commits

d39d14ffa net: Add helper function to compare inetpeer addresses ... Browse Code »

tcp_metrics and inetpeer both have functions to compare inetpeer
addresses. Consolidate into 1 version.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-29 04:32:36 +0800
3abef286c net: Add set,get helpers for inetpeer addresses ... Browse Code »

Use inetpeer set,get helpers in tcp_metrics rather than peeking into
the inetpeer_addr struct.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-29 04:32:36 +0800
72afa352d net: Introduce ipv4_addr_hash and use it for tcp metrics ... Browse Code »

Refactors a common line into helper function.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-29 04:32:35 +0800

10 Jul, 2015

1 commit

071d5080e tcp: add tcp_in_slow_start helper ... Browse Code »

Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.

Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Nandita Dukkipati
Signed-off-by: David S. Miller

Yuchung Cheng
2015-07-10 05:22:52 +0800

08 Apr, 2015

1 commit

2646c831c tcp: RFC7413 option support for Fast Open client ... Browse Code »

Fast Open has been using an experimental option with a magic number
(RFC6994). This patch makes the client by default use the RFC7413
option (34) to get and send Fast Open cookies. This patch makes
the client solicit cookies from a given server first with the
RFC7413 option. If that fails to elicit a cookie, then it tries
the RFC6994 experimental option. If that also fails, it uses the
RFC7413 option on all subsequent connect attempts. If the server
returns a Fast Open cookie then the client caches the form of the
option that successfully elicited a cookie, and uses that form on
later connects when it presents that cookie.

The idea is to gradually obsolete the use of experimental options as
the servers and clients upgrade, while keeping the interoperability
meanwhile.

Signed-off-by: Daniel Lee
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: David S. Miller

Daniel Lee
2015-04-08 06:36:39 +0800

04 Apr, 2015

1 commit

51456b291 ipv4: coding style: comparison for equality with NULL ... Browse Code »

The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-04 00:11:15 +0800

01 Apr, 2015

3 commits

67b61f6c1 netlink: implement nla_get_in_addr and nla_get_in6_addr ... Browse Code »

Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800
930345ea6 netlink: implement nla_put_in_addr and nla_put_in6_addr ... Browse Code »

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800
8f55db486 tcp: simplify inetpeer_addr_base use ... Browse Code »

In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and get rid
of the typecasting.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800

17 Mar, 2015

1 commit

9f1ab1867 tcp_metrics: fix wrong lockdep annotations ... Browse Code »

Changes in tcp_metric hash table are protected by tcp_metrics_lock
only, not by genl_mutex

While we are at it use deref_locked() instead of rcu_dereference()
in tcp_new() to avoid unnecessary barrier, as we hold tcp_metrics_lock
as well.

Reported-by: Andrew Vagin
Signed-off-by: Eric Dumazet
Fixes: 098a697b497e ("tcp_metrics: Use a single hash table for all network namespaces.")
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric Dumazet
2015-03-17 04:32:23 +0800

13 Mar, 2015

6 commits

098a697b4 tcp_metrics: Use a single hash table for all network namespaces. ... Browse Code »

Now that all of the operations are safe on a single hash table
accross network namespaces, allocate a single global hash table
and update the code to use it.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 13:57:07 +0800
04f721c67 tcp_metrics: Rewrite tcp_metrics_flush_all ... Browse Code »

Rewrite tcp_metrics_flush_all so that it can cope with entries from
different network namespaces on it's hash chain.

This is based on the logic in tcp_metrics_nl_cmd_del for deleting
a selection of entries from a tcp metrics hash chain.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 13:57:07 +0800
8a4bff714 tcp_metrics: Remove the unused return code from tcp_metrics_flush_all ... Browse Code »

tcp_metrics_flush_all always returns 0. Remove the unnecessary return code.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 13:57:07 +0800
849e8a0ca tcp_metrics: Add a field tcpm_net and verify it matches on lookup ... Browse Code »

In preparation for using one tcp metrics hash table for all network
namespaces add a field tcpm_net to struct tcp_metrics_block, and
verify that field on all hash table lookups.

Make the field tcpm_net of type possible_net_t so it takes no space
when network namespaces are disabled.

Further add a function tm_net to read that field so we can be
efficient when network namespaces are disabled and concise
the rest of the time.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 13:57:07 +0800
3e5da62d0 tcp_metrics: Mix the network namespace into the hash function. ... Browse Code »

In preparation for using one hash table for all network namespaces
mix the network namespace into the hash value.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 13:57:07 +0800
6493517ea tcp_metrics: panic when tcp_metrics_init fails. ... Browse Code »

There is not a practical way to cleanup during boot so
just panic if there is a problem initializing tcp_metrics.

That will at least give us a clear place to start debugging
if something does go wrong.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 13:57:07 +0800

18 Jan, 2015

1 commit

053c095a8 netlink: make nlmsg_end() and genlmsg_end() void ... Browse Code »

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

if (my_function(...))
/* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller

Johannes Berg
2015-01-18 14:03:45 +0800

15 Aug, 2014

1 commit

a26552afe tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic ... Browse Code »

tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.

During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.

Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.

In general, use of tcp_tw_recycle is disadvised.

Cc: Eric Dumazet
Cc: Florian Westphal
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-08-15 05:38:54 +0800

01 Aug, 2014

1 commit

388070faa tcp: don't require root to read tcp_metrics ... Browse Code »

commit d23ff7016 (tcp: add generic netlink support for tcp_metrics) introduced
netlink support for the new tcp_metrics, however it restricted getting of
tcp_metrics to root user only. This is a change from how these values could
have been fetched when in the old route cache. Unless there's a legitimate
reason to restrict the reading of these values it would be better if normal
users could fetch them.

Cc: Julian Anastasov
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Debabrata Banerjee
Signed-off-by: David S. Miller

Banerjee, Debabrata
2014-08-01 05:07:37 +0800

05 Jun, 2014

1 commit

4cb28970a net: use the new API kvfree() ... Browse Code »

It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso
Cc: "David S. Miller"
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2014-06-05 15:49:51 +0800

27 Feb, 2014

1 commit

740b0f184 tcp: switch rtt estimations to usec resolution ... Browse Code »

Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.

FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'

TCP_CONG_RTT_STAMP is no longer needed.

As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.

In this example ss command displays a srtt of 32 usecs (10Gbit link)

lpk51:~# ./ss -i dst lpk52
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
tcp ESTAB 0 1 10.246.11.51:42959
10.246.11.52:64614
cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

Updated iproute2 ip command displays :

lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51

Old binary displays :

lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51

With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng

Signed-off-by: Eric Dumazet
Acked-by: Neal Cardwell
Cc: Stephen Hemminger
Cc: Yuchung Cheng
Cc: Larry Brakmo
Cc: Julian Anastasov
Signed-off-by: David S. Miller

Eric Dumazet
2014-02-27 06:08:40 +0800

24 Jan, 2014

1 commit

3ad88cf70 tcp: metrics: Handle v6/v4-mapped sockets in tcp-metrics ... Browse Code »

A socket may be v6/v4-mapped. In that case sk->sk_family is AF_INET6,
but the IP being used is actually an IPv4-address.
Current's tcp-metrics will thus represent it as an IPv6-address:

root@server:~# ip tcp_metrics
::ffff:10.1.1.2 age 22.920sec rtt 18750us rttvar 15000us cwnd 10
10.1.1.2 age 47.970sec rtt 16250us rttvar 10000us cwnd 10

This patch modifies the tcp-metrics so that they are able to handle the
v6/v4-mapped sockets correctly.

Signed-off-by: Christoph Paasch
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-24 04:48:28 +0800

23 Jan, 2014

1 commit

00ca9c5b2 tcp: metrics: Fix rcu-race when deleting multiple entries ... Browse Code »

In bbf852b96ebdc6d1 I introduced the tmlist, which allows to delete
multiple entries from the cache that match a specified destination if no
source-IP is specified.

However, as the cache is an RCU-list, we should not create this tmlist, as
it will change the tcpm_next pointer of the element that will be deleted
and so a thread iterating over the cache's entries while holding the
RCU-lock might get "redirected" to this tmlist.

This patch fixes this, by reverting back to the old behavior prior to
bbf852b96ebdc6d1, which means that we simply change the tcpm_next
pointer of the previous element (pp) to jump over the one we are
deleting.
The difference is that we call kfree_rcu() directly on the cache entry,
which allows us to delete multiple entries from the list.

Fixes: bbf852b96ebdc6d1 (tcp: metrics: Delete all entries matching a certain destination)
Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-23 13:26:16 +0800

18 Jan, 2014

2 commits

418044205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller

David S. Miller
2014-01-18 16:55:41 +0800
77f99ad16 tcp: metrics: Avoid duplicate entries with the same destination-IP ... Browse Code »

Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.

This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.

Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-18 10:05:34 +0800

11 Jan, 2014

5 commits

3e7013ddf tcp: metrics: Allow selective get/del of tcp-metrics based on src IP ... Browse Code »

We want to be able to get/del tcp-metrics based on the src IP. This
patch adds the necessary parsing of the netlink attribute and if the
source address is set, it will match on this one too.

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-11 06:38:18 +0800
bbf852b96 tcp: metrics: Delete all entries matching a certain destination ... Browse Code »

As we now can have multiple entries per destination-IP, the "ip
tcp_metrics delete address ADDRESS" command deletes all of them.

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-11 06:38:18 +0800
8a59359cb tcp: metrics: New netlink attribute for src IP and dumped in netlink reply ... Browse Code »

This patch adds a new netlink attribute for the source-IP and appends it
to the netlink reply. Now, iproute2 can have access to the source-IP.

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-11 06:38:18 +0800
a54430282 tcp: metrics: Add source-address to tcp-metrics ... Browse Code »

We add the source-address to the tcp-metrics, so that different metrics
will be used per source/destination-pair. We use the destination-hash to
store the metric inside the hash-table. That way, deleting and dumping
via "ip tcp_metrics" is easy.

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-11 06:38:18 +0800
324fd55a1 tcp: metrics: rename tcpm_addr to tcpm_daddr ... Browse Code »

As we will add also the source-address, we rename all accesses to the
tcp-metrics address to use "daddr".

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2014-01-11 06:38:18 +0800

20 Nov, 2013

1 commit

c53ed7423 genetlink: only pass array to genl_register_family_with_ops() ... Browse Code »

As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2013-11-20 05:39:05 +0800

15 Nov, 2013

2 commits

4534de830 genetlink: make all genl_ops users const ... Browse Code »

Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:

@@
identifier ops;
@@
+const
struct genl_ops ops[] = {
...
};

(except the struct thing in net/openvswitch/datapath.c)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2013-11-15 06:10:41 +0800
dccf76ca6 net-tcp: fix panic in tcp_fastopen_cache_set() ... Browse Code »

We had some reports of crashes using TCP fastopen, and Dave Jones
gave a nice stack trace pointing to the error.

Issue is that tcp_get_metrics() should not be called with a NULL dst

Fixes: 1fe4c481ba637 ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet
Reported-by: Dave Jones
Cc: Yuchung Cheng
Acked-by: Yuchung Cheng
Tested-by: Dave Jones
Signed-off-by: David S. Miller

Eric Dumazet
2013-11-15 05:33:18 +0800

30 Oct, 2013

1 commit

c968601d1 tcp: temporarily disable Fast Open on SYN timeout ... Browse Code »

Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")

However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections. An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".

The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.

Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
2013-10-30 10:50:41 +0800

10 Oct, 2013

1 commit

634fb979e inet: includes a sock_common in request_sock ... Browse Code »

TCP listener refactoring, part 5 :

We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.

This patch includes the needed struct sock_common in front
of struct request_sock

This means there is no more inet6_request_sock IPv6 specific
structure.

Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.

loc_port -> ir_loc_port
loc_addr -> ir_loc_addr
rmt_addr -> ir_rmt_addr
rmt_port -> ir_rmt_port
iif -> ir_iif

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-10 12:08:07 +0800