Eric Lee / smarc-fsl-linux-kernel

25 Jul, 2018

1 commit

0348dcd98 ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns ... Browse Code »

[ Upstream commit 70ba5b6db96ff7324b8cfc87e0d0383cf59c9677 ]

The low and high values of the net.ipv4.ping_group_range sysctl were
being silently forced to the default disabled state when a write to the
sysctl contained GIDs that didn't map to the associated user namespace.
Confusingly, the sysctl's write operation would return success and then
a subsequent read of the sysctl would indicate that the low and high
values are the overflowgid.

This patch changes the behavior by clearly returning an error when the
sysctl write operation receives a GID range that doesn't map to the
associated user namespace. In such a situation, the previous value of
the sysctl is preserved and that range will be returned in a subsequent
read of the sysctl.

Signed-off-by: Tyler Hicks
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Tyler Hicks
7 years ago

22 Jul, 2018

1 commit

3373d6d05 tcp: fix Fast Open key endianness ... Browse Code »

[ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ]

Fast Open key could be stored in different endian based on the CPU.
Previously hosts in different endianness in a server farm using
the same key config (sysctl value) would produce different cookies.
This patch fixes it by always storing it as little endian to keep
same API for LE hosts.

Reported-by: Daniele Iamartino
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Neal Cardwell
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Yuchung Cheng
7 years ago

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
8 years ago

01 Aug, 2017

1 commit

b6690b143 tcp: remove low_latency sysctl ... Browse Code »

Was only checked by the removed prequeue code.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
8 years ago

16 Jun, 2017

1 commit

734942cc4 tcp: ULP infrastructure ... Browse Code »

Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong. The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.

Example usage:

setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

modules will call:
tcp_register_ulp(&tcp_tls_ulp_ops);

to register/unregister their ulp, with an init function and name.

A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc. Example:

$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls

There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.

Signed-off-by: Boris Pismenny
Signed-off-by: Dave Watson
Signed-off-by: David S. Miller

Dave Watson
8 years ago

08 Jun, 2017

3 commits

5d2ed0521 tcp: Namespaceify sysctl_tcp_timestamps ... Browse Code »

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
8 years ago
9bb37ef00 tcp: Namespaceify sysctl_tcp_window_scaling ... Browse Code »

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
8 years ago
f93010342 tcp: Namespaceify sysctl_tcp_sack ... Browse Code »

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
8 years ago

25 Apr, 2017

2 commits

cf1ef3f07 net/tcp_fastopen: Disable active side TFO in certain scenarios ... Browse Code »

Middlebox firewall issues can potentially cause server's data being
blackholed after a successful 3WHS using TFO. Following are the related
reports from Apple:
https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
Slide 31 identifies an issue where the client ACK to the server's data
sent during a TFO'd handshake is dropped.
C ---> syn-data ---> S
C X S
[retry and timeout]

https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
Slide 5 shows a similar situation that the server's data gets dropped
after 3WHS.
C ---- syn-data ---> S
C S
S (accept & write)
C? X
Acked-by: Yuchung Cheng
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Wei Wang
8 years ago
58c4c6a3f net: add rcu locking when changing early demux ... Browse Code »

systemd-sysctl is triggering a suspicious RCU usage message when
net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
a sysctl config file:

[ 33.896184] ===============================
[ 33.899558] [ ERR: suspicious RCU usage. ]
[ 33.900624] 4.11.0-rc7+ #104 Not tainted
[ 33.901698] -------------------------------
[ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
[ 33.905724]
other info that might help us debug this:

[ 33.907656]
rcu_scheduler_active = 2, debug_locks = 0
[ 33.909288] 1 lock held by systemd-sysctl/143:
[ 33.910373] #0: (sb_writers#5){.+.+.+}, at: [] file_start_write+0x45/0x48
[ 33.912407]
stack backtrace:
[ 33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
[ 33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 33.917870] Call Trace:
[ 33.918431] dump_stack+0x81/0xb6
[ 33.919241] lockdep_rcu_suspicious+0x10f/0x118
[ 33.920263] proc_configure_early_demux+0x65/0x10a
[ 33.921391] proc_udp_early_demux+0x3a/0x41

add rcu locking to proc_configure_early_demux.

Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
8 years ago

25 Mar, 2017

1 commit

dddb64bcb net: Add sysctl to toggle early demux for tcp and udp ... Browse Code »

Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.

By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -> 788Mbps unconnected single stream UDPv4
633 -> 654Mbps unconnected UDPv4 different sources

The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.

Both sysctls are enabled by default to preserve existing behavior.

v1->v2: Change function pointer instead of adding conditional as
suggested by Stephen.

v2->v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.

v3->v4: Store and use read once result instead of querying pointer
again incorrectly.

v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}

Signed-off-by: Subash Abhinov Kasiviswanathan
Suggested-by: Eric Dumazet
Cc: Stephen Hemminger
Cc: Tom Herbert
Cc: David Miller
Signed-off-by: David S. Miller

subashab@codeaurora.org
8 years ago

22 Mar, 2017

1 commit

bf4e0a3db net: ipv4: add support for ECMP hash policy choice ... Browse Code »

This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
0 - layer 3 (default)
1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
8 years ago

17 Mar, 2017

1 commit

4396e4618 tcp: remove tcp_tw_recycle ... Browse Code »

The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.

Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Eric Dumazet
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Cc: Lutz Vieweg
Cc: Florian Westphal
Signed-off-by: David S. Miller

Soheil Hassas Yeganeh
8 years ago

31 Jan, 2017

1 commit

63a6fff35 net: Avoid receiving packets with an l3mdev on unbound UDP sockets ... Browse Code »

Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.

So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.

Signed-off-by: Robert Shearman
Acked-by: David Ahern
Tested-by: David Ahern
Signed-off-by: David S. Miller

Robert Shearman
9 years ago

25 Jan, 2017

1 commit

4548b683b Introduce a sysctl that modifies the value of PROT_SOCK. ... Browse Code »

Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace. To
disable all privileged ports set this to zero. It also checks for
overlap with the local port range. The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration. The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace. This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.

Signed-off-by: Krister Johansen
Signed-off-by: David S. Miller

Krister Johansen
9 years ago

14 Jan, 2017

1 commit

4a7f60094 tcp: remove thin_dupack feature ... Browse Code »

Thin stream DUPACK is to start fast recovery on only one DUPACK
provided the connection is a thin stream (i.e., low inflight). But
this older feature is now subsumed with RACK. If a connection
receives only a single DUPACK, RACK would arm a reordering timer
and soon starts fast recovery instead of timeout if no further
ACKs are received.

The socket option (THIN_DUPACK) is kept as a nop for compatibility.
Note that this patch does not change another thin-stream feature
which enables linear RTO. Although it might be good to generalize
that in the future (i.e., linear RTO for the first say 3 retries).

Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
9 years ago

12 Jan, 2017

1 commit

02ac5d148 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two AF_* families adding entries to the lockdep tables
at the same time.

Signed-off-by: David S. Miller

David S. Miller
9 years ago

10 Jan, 2017

1 commit

b007f0907 ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int ... Browse Code »

> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov
Signed-off-by: David S. Miller

Pavel Tikhomirov
9 years ago

30 Dec, 2016

2 commits

fee83d097 ipv4: Namespaceify tcp_max_syn_backlog knob ... Browse Code »

Different namespace application might require different maximal
number of remembered connection requests.

Signed-off-by: Haishuang Yan
Signed-off-by: David S. Miller

Haishuang Yan
9 years ago
1946e672c ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob ... Browse Code »

Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan
Signed-off-by: David S. Miller

Haishuang Yan
9 years ago

28 Dec, 2016

1 commit

56ab6b930 ipv4: Namespaceify tcp_tw_reuse knob ... Browse Code »

Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.

Signed-off-by: Haishuang Yan
Signed-off-by: David S. Miller

Haishuang Yan
9 years ago

23 Oct, 2016

1 commit

396a30cce ipv4: use the right lock for ping_group_range ... Browse Code »

This reverts commit a681574c99be23e4d20b769bf0e543239c364af5
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).

Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.

We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.

Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()")
Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL")
Cc: Eric Dumazet
Cc: Eric Salo
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
9 years ago

21 Oct, 2016

1 commit

a681574c9 ipv4: disable BH in set_ping_group_range() ... Browse Code »

In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()

Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels")
Signed-off-by: Eric Dumazet
Reported-by: Eric Salo
Signed-off-by: David S. Miller

Eric Dumazet
9 years ago

24 May, 2016

1 commit

049bbf589 ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n ... Browse Code »

Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
moves the default TTL assignment, and as side-effect IPv4 TTL now
has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).

The sysctl_ip_default_ttl is fundamental for IP to work properly,
as it provides the TTL to be used as default. The defautl TTL may be
used in ip_selected_ttl, through the following flow:

ip_select_ttl
ip4_dst_hoplimit
net->ipv4.sysctl_ip_default_ttl

This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
in net_init_net, called during ipv4's initialization.

Without this commit, a kernel built without sysctl support will send
all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
with setsockopt).

Given a similar issue might appear on the other knobs that were
namespaceify, this commit also moves them.

Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
Signed-off-by: Ezequiel Garcia
Signed-off-by: David S. Miller

Ezequiel Garcia
9 years ago

12 Apr, 2016

1 commit

a6db4494d net: ipv4: Consider failed nexthops in multipath routes ... Browse Code »

Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

[h2] [h3] 15.0.0.5
| |
3| 3|
[SP1] [SP2]--+
1 2 1 2
| | /-------------+ |
| \ / |
| X |
| / \ |
| / \---------------\ |
1 2 1 2
12.0.0.2 [TOR1] 3-----------------3 [TOR2] 12.0.0.3
4 4
\ /
\ /
\ /
-------| |-----/
1 2
[TOR3]
3|
|
[h1] 12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

root@h1:~# ip ro ls
...
12.0.0.0/24 dev swp1 proto kernel scope link src 12.0.0.1
15.0.0.0/16
nexthop via 12.0.0.2 dev swp1 weight 1
nexthop via 12.0.0.3 dev swp1 weight 1
...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

root@h1:~# ip neigh show
...
12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
12.0.0.2 dev swp1 FAILED
...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern
Reviewed-by: Julian Anastasov
Signed-off-by: David S. Miller

David Ahern
9 years ago

17 Feb, 2016

3 commits

e21145a98 ipv4: namespacify ip_early_demux sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
287b7f38f ipv4: Namespacify ip_dynaddr sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
fa50d974d ipv4: Namespaceify ip_default_ttl sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago

11 Feb, 2016

4 commits

165094afc igmp: Namespacify igmp_qrv sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
87a8a2ae6 igmp: Namespaceify igmp_llm_reports sysctl knob ... Browse Code »

This was initially introduced in df2cf4a78e488d26 ("IGMP: Inhibit
reports for local multicast groups") by defining the sysctl in the
ipv4_net_table array, however it was never implemented to be
namespace aware. Fix this by changing the code accordingly.

Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
166b6b2d6 igmp: Namespaceify igmp_max_msf sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
815c52700 igmp: Namespaceify igmp_max_memberships sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago

08 Feb, 2016

8 commits

4979f2d9f ipv4: Namespaceify tcp_notsent_lowat sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
1e579caa1 ipv4: Namespaceify tcp_fin_timeout sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
c402d9bef ipv4: Namespaceify tcp_orphan_retries sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
c6214a97c ipv4: Namespaceify tcp_retries2 sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
ae5c3f406 ipv4: Namespaceify tcp_retries1 sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
1043e25ff ipv4: Namespaceify tcp reordering sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
12ed8244e ipv4: Namespaceify tcp syncookies sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago
7c083ecb3 ipv4: Namespaceify tcp synack retries sysctl knob ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller

Nikolay Borisov
9 years ago