Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2020

1 commit

b38e7819c icmp: randomize the global rate limiter ... Browse Code »

Keyu Man reported that the ICMP rate limiter could be used
by attackers to get useful signal. Details will be provided
in an upcoming academic publication.

Our solution is to add some noise, so that the attackers
no longer can get help from the predictable token bucket limiter.

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet
Reported-by: Keyu Man
Signed-off-by: Jakub Kicinski

Eric Dumazet
2020-10-17 07:47:09 +0800

16 Oct, 2020

2 commits

9ff9b0d39 Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from Jakub Kicinski:

- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.

Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).

- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.

- Allow more than 255 IPv4 multicast interfaces.

- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.

- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.

- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.

- Allow more calls to same peer in RxRPC.

- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.

- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.

- Add TC actions for implementing MPLS L2 VPNs.

- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.

- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.

- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.

- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.

- Support sleepable BPF programs, initially targeting LSM and tracing.

- Add bpf_d_path() helper for returning full path for given 'struct
path'.

- Make bpf_tail_call compatible with bpf-to-bpf calls.

- Allow BPF programs to call map_update_elem on sockmaps.

- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).

- Support listing and getting information about bpf_links via the bpf
syscall.

- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).

- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.

- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).

- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.

- Add XDP support for Intel's igb driver.

- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.

- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.

- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.

- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.

- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.

- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.

- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.

- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.

- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.

- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...

Linus Torvalds
2020-10-16 09:42:13 +0800
2295cddf9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.

In both cases code was added on both sides in the same place
so just keep both.

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2020-10-16 03:43:21 +0800

15 Oct, 2020

1 commit

e1e84eb58 ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) ... Browse Code »

As per RFC792, ICMP errors should be sent to the source host.

However, in configurations with Virtual Routing and Forwarding tables,
looking up which routing table to use is currently done by using the
destination net_device.

commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
determine L3 domain") changes the interface passed to
l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
to skb_dst(skb_in)->dev. This effectively uses the destination device
rather than the source device for choosing which routing table should be
used to lookup where to send the ICMP error.

Therefore, if the source and destination interfaces are within separate
VRFs, or one in the global routing table and the other in a VRF, looking
up the source host in the destination interface's routing table will
fail if the destination interface's routing table contains no route to
the source host.

One observable effect of this issue is that traceroute does not work in
the following cases:

- Route leaking between global routing table and VRF
- Route leaking between VRFs

Preferably use the source device routing table when sending ICMP error
messages. If no source device is set, fall-back on the destination
device routing table. Else, use the main routing table (index 0).

[ It has been pointed out that a similar issue may exist with ICMP
errors triggered when forwarding between network namespaces. It would
be worthwhile to investigate, but is outside of the scope of this
investigation. ]

[ It has also been pointed out that a similar issue exists with
unreachable / fragmentation needed messages, which can be triggered by
changing the MTU of eth1 in r1 to 1400 and running:

ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2

Some investigation points to raw_icmp_error() and raw_err() as being
involved in this last scenario. The focus of this patch is TTL expired
ICMP messages, which go through icmp_route_lookup.
Investigation of failure modes related to raw_icmp_error() is beyond
this investigation's scope. ]

Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain")
Link: https://tools.ietf.org/html/rfc792
Signed-off-by: Mathieu Desnoyers
Reviewed-by: David Ahern
Signed-off-by: Jakub Kicinski

Mathieu Desnoyers
2020-10-15 08:14:11 +0800

01 Sep, 2020

1 commit

5af68891d net: clean up codestyle ... Browse Code »

This is a pure codestyle cleanup patch. No functional change intended.

Signed-off-by: Miaohe Lin
Signed-off-by: David S. Miller

Miaohe Lin
2020-09-01 03:33:34 +0800

25 Aug, 2020

1 commit

373c15c2e net: Use helper macro RT_TOS() in __icmp_send() ... Browse Code »

Use helper macro RT_TOS() to get tos in __icmp_send().

Signed-off-by: Miaohe Lin
Signed-off-by: David S. Miller

Miaohe Lin
2020-08-25 09:12:36 +0800

21 Aug, 2020

3 commits

cc44c17ba csum_partial_copy_nocheck(): drop the last argument ... Browse Code »

It's always 0. Note that we theoretically could use ~0U as well -
result will be the same modulo 0xffff, _if_ the damn thing did the
right thing for any value of initial sum; later we'll make use of
that when convenient.

However, unlike csum_and_copy_..._user(), there are instances that
did not work for arbitrary initial sums; c6x is one such.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:14 +0800
3ea7ca80d icmp_push_reply(): reorder adding the checksum up ... Browse Code »

do csum_partial_copy_nocheck() on the first fragment, then
add the rest to it. Equivalent transformation.

That was the only caller of csum_partial_copy_nocheck() that
might pass it non-zero as the last argument.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:13 +0800
8d5930dfb skb_copy_and_csum_bits(): don't bother with the last argument ... Browse Code »

it's always 0

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:13 +0800

25 Jul, 2020

3 commits

01370434d icmp6: support rfc 4884 ... Browse Code »

Extend the rfc 4884 read interface introduced for ipv4 in
commit eba75c587e81 ("icmp: support rfc 4884") to ipv6.

Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.

Changes v1->v2:
- make ipv6_icmp_error_rfc4884 static (file scope)

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2020-07-25 08:12:41 +0800
178c49d9f icmp: prepare rfc 4884 for ipv6 ... Browse Code »

The RFC 4884 spec is largely the same between IPv4 and IPv6.
Factor out the IPv4 specific parts in preparation for IPv6 support:

- icmp types supported

- icmp header size, and thus offset to original datagram start

- datagram length field offset in icmp(6)hdr.

- datagram length field word size: 4B for IPv4, 8B for IPv6.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2020-07-25 08:12:41 +0800
c4e9e09f5 icmp: revise rfc4884 tests ... Browse Code »

1) Only accept packets with original datagram len field >= header len.

The extension header must start after the original datagram headers.
The embedded datagram len field is compared against the 128B minimum
stipulated by RFC 4884. It is unlikely that headers extend beyond
this. But as we know the exact header length, check explicitly.

2) Remove the check that datagram length must be len > 576)
+ if (-skb_network_offset(skb) + skb->len > 576)

Fixes: eba75c587e81 ("icmp: support rfc 4884")
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2020-07-25 08:12:41 +0800

20 Jul, 2020

1 commit

eba75c587 icmp: support rfc 4884 ... Browse Code »

Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an
extension struct if present.

ICMP messages may include an extension structure after the original
datagram. RFC 4884 standardized this behavior. It stores the offset
in words to the extension header in u8 icmphdr.un.reserved[1].

The field is valid only for ICMP types destination unreachable, time
exceeded and parameter problem, if length is at least 128 bytes and
entire packet does not exceed 576 bytes.

Return the offset to the start of the extension struct when reading an
ICMP error from the error queue, if it matches the above constraints.

Do not return the raw u8 field. Return the offset from the start of
the user buffer, in bytes. The kernel does not return the network and
transport headers, so subtract those.

Also validate the headers. Return the offset regardless of validation,
as an invalid extension must still not be misinterpreted as part of
the original datagram. Note that !invalid does not imply valid. If
the extension version does not match, no validation can take place,
for instance.

For backward compatibility, make this optional, set by setsockopt
SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see
github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c

For forward compatibility, reserve only setsockopt value 1, leaving
other bits for additional icmp extensions.

Changes
v1->v2:
- convert word offset to byte offset from start of user buffer
- return in ee_data as u8 may be insufficient
- define extension struct and object header structs
- return len only if constraints met
- if returning len, also validate

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2020-07-20 10:20:22 +0800

02 Jul, 2020

1 commit

0da7536fb ip: Fix SO_MARK in RST, ACK and ICMP packets ... Browse Code »

When no full socket is available, skbs are sent over a per-netns
control socket. Its sk_mark is temporarily adjusted to match that
of the real (request or timewait) socket or to reflect an incoming
skb, so that the outgoing skb inherits this in __ip_make_skb.

Introduction of the socket cookie mark field broke this. Now the
skb is set through the cookie and cork:

# init sockc.mark from sk_mark or cmsg
ip_append_data
ip_setup_cork # convert sockc.mark to cork mark
ip_push_pending_frames
ip_finish_skb
__ip_make_skb # set skb->mark to cork mark

But I missed these special control sockets. Update all callers of
__ip(6)_make_skb that were originally missed.

For IPv6, the same two icmp(v6) paths are affected. The third
case is not, as commit 92e55f412cff ("tcp: don't annotate
mark on control socket from tcp_v6_send_response()") replaced
the ctl_sk->sk_mark with passing the mark field directly as a
function argument. That commit predates the commit that
introduced the bug.

Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg")
Signed-off-by: Willem de Bruijn
Reported-by: Martin KaFai Lau
Reviewed-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Willem de Bruijn
2020-07-02 08:38:30 +0800

29 Apr, 2020

1 commit

1cec2caca docs: networking: convert ip-sysctl.txt to ReST ... Browse Code »

- add SPDX header;
- adjust titles and chapters, adding proper markups;
- mark code blocks and literals as such;
- mark lists as such;
- mark tables as such;
- use footnote markup;
- adjust identation, whitespaces and blank lines;
- add to networking/index.rst.

Signed-off-by: Mauro Carvalho Chehab
Signed-off-by: David S. Miller

Mauro Carvalho Chehab
2020-04-29 05:40:18 +0800

13 Mar, 2020

1 commit

a8eceea84 inet: Use fallthrough; ... Browse Code »

Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

And by hand:

net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
that causes gcc to emit a warning if converted in-place.

So move the new fallthrough; inside the containing #ifdef/#endif too.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2020-03-13 06:55:00 +0800

14 Feb, 2020

1 commit

0b41713b6 icmp: introduce helper for nat'd source address in network device context ... Browse Code »

This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.

Signed-off-by: Jason A. Donenfeld
Cc: Florian Westphal
Signed-off-by: David S. Miller

Jason A. Donenfeld
2020-02-14 06:19:00 +0800

09 Nov, 2019

1 commit

bbab7ef23 net: icmp: fix data-race in cmp_global_allow() ... Browse Code »

This code reads two global variables without protection
of a lock. We need READ_ONCE()/WRITE_ONCE() pairs to
avoid load/store-tearing and better document the intent.

KCSAN reported :
BUG: KCSAN: data-race in icmp_global_allow / icmp_global_allow

read to 0xffffffff861a8014 of 4 bytes by task 11201 on cpu 0:
icmp_global_allow+0x36/0x1b0 net/ipv4/icmp.c:254
icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
dst_link_failure include/net/dst.h:419 [inline]
vti_xmit net/ipv4/ip_vti.c:243 [inline]
vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
__netdev_start_xmit include/linux/netdevice.h:4420 [inline]
netdev_start_xmit include/linux/netdevice.h:4434 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
__dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179

write to 0xffffffff861a8014 of 4 bytes by task 11183 on cpu 1:
icmp_global_allow+0x174/0x1b0 net/ipv4/icmp.c:272
icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
dst_link_failure include/net/dst.h:419 [inline]
vti_xmit net/ipv4/ip_vti.c:243 [inline]
vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
__netdev_start_xmit include/linux/netdevice.h:4420 [inline]
netdev_start_xmit include/linux/netdevice.h:4434 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
__dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: David S. Miller

Eric Dumazet
2019-11-09 04:32:43 +0800

04 Nov, 2019

1 commit

2adf81c0f net: icmp: use input address in traceroute ... Browse Code »

Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the
primary address of the interface the packet was received on, even if
the path goes through a secondary address. In the example:

1.0.3.1/24
---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ----
|H1|--------------------------|R1|--------------------------|H2|
---- N1 ---- N2 ----

where 1.0.3.1/24 is R1's primary address on N1, traceroute from
H1 to H2 returns:

traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms
2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms

After applying this patch, it returns:

traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms
2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms

Original-patch-by: Bill Fenner
Signed-off-by: Francesco Ruggeri
Reviewed-by: David Ahern
Signed-off-by: David S. Miller

Francesco Ruggeri
2019-11-04 09:25:18 +0800

25 Aug, 2019

1 commit

e2c693934 ipv4/icmp: fix rt dst dev null pointer dereference ... Browse Code »

In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:

- ip6gre_tunnel_xmit
- ip6gre_xmit_ipv4
- __gre6_xmit
- ip6_tnl_xmit
- if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
- icmp_send
- net = dev_net(rt->dst.dev); dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the reason we need rt->dst.dev is to get the net.
So we can just try get it from skb->dev when rt->dst.dev is NULL.

v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
still use dst.dev and do a check to avoid crash.

v3: No changes.

v2: fix the issue in __icmp_send() instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.

Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Signed-off-by: Hangbin Liu
Reviewed-by: Julian Anastasov
Acked-by: Jonathan Lemon
Signed-off-by: David S. Miller

Hangbin Liu
2019-08-25 05:49:35 +0800

22 Aug, 2019

1 commit

0f404bbda net: fix icmp_socket_deliver argument 2 input ... Browse Code »

it expects a unsigned int, but got a __be32

Signed-off-by: Li RongQing
Signed-off-by: Zhang Yu
Signed-off-by: David S. Miller

Li RongQing
2019-08-22 11:43:22 +0800

08 Jun, 2019

1 commit

a6cdeeb16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller

David S. Miller
2019-06-08 02:00:14 +0800

04 Jun, 2019

1 commit

046386ca0 ipv4: icmp: use this_cpu_read() in icmp_sk() ... Browse Code »

this_cpu_read(*X) is faster than *this_cpu_ptr(X)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2019-06-04 06:08:53 +0800

31 May, 2019

1 commit

2874c5fd2 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:26:32 +0800

03 Mar, 2019

1 commit

9eb359140 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2019-03-03 04:54:35 +0800

26 Feb, 2019

1 commit

9ef6b42ad net: Add __icmp_send helper. ... Browse Code »

Add __icmp_send function having ip_options struct parameter

Signed-off-by: Sergey Nazarov
Reviewed-by: Paul Moore
Signed-off-by: David S. Miller

Nazarov Sergey
2019-02-26 06:32:35 +0800

25 Feb, 2019

1 commit

e9128c14b ipv4: icmp: use icmp_sk_exit() ... Browse Code »

Simply use icmp_sk_exit() when inet_ctl_sock_create() fail in icmp_sk_init().

Signed-off-by: Kefeng Wang
Signed-off-by: David S. Miller

Kefeng Wang
2019-02-25 13:57:26 +0800

09 Nov, 2018

1 commit

32bbd8793 net: Convert protocol error handlers from void to int ... Browse Code »

We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever handlers can't handle errors because, even if valid, they don't
match a protocol or any of its states.

This has no effect on existing error handling paths.

Signed-off-by: Stefano Brivio
Reviewed-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Stefano Brivio
2018-11-09 09:13:08 +0800

27 Sep, 2018

2 commits

1042caa79 net-ipv4: remove 2 always zero parameters from ipv4_redirect() ... Browse Code »

(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern
Signed-off-by: Maciej Żenczykowski
Signed-off-by: David S. Miller

Maciej Żenczykowski
2018-09-27 11:30:55 +0800
d888f3966 net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu() ... Browse Code »

(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern
Signed-off-by: Maciej Żenczykowski
Signed-off-by: David S. Miller

Maciej Żenczykowski
2018-09-27 11:30:55 +0800

07 Jul, 2018

1 commit

351782067 ipv4: ipcm_cookie initializers ... Browse Code »

Initialize the cookie in one location to reduce code duplication and
avoid bugs from inconsistent initialization, such as that fixed in
commit 9887cba19978 ("ip: limit use of gso_size to udp").

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2018-07-07 09:58:49 +0800

04 Jul, 2018

1 commit

bc969a977 net: ipv4: Hook into time based transmission ... Browse Code »

Add a transmit_time field to struct inet_cork, then copy the
timestamp from the CMSG cookie at ip_setup_cork() so we can
safely copy it into the skb later during __ip_make_skb().

For the raw fast path, just perform the copy at raw_send_hdrinc().

Signed-off-by: Richard Cochran
Signed-off-by: Jesus Sanchez-Palencia
Signed-off-by: David S. Miller

Jesus Sanchez-Palencia
2018-07-04 21:30:27 +0800

24 Oct, 2017

1 commit

152854025 ipv4: icmp: use BUG_ON instead of if condition followed by BUG ... Browse Code »

Use BUG_ON instead of if condition followed by BUG in icmp_timestamp.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2017-10-24 17:44:42 +0800

15 Oct, 2017

1 commit

258bbb1b0 icmp: don't fail on fragment reassembly time exceeded ... Browse Code »

The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).

However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).

Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.

The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:

# start netserver in the test namespace
ip netns add test
ip netns exec test netserver

# create a VETH pair
ip link add name veth0 type veth peer name veth0 netns test
ip link set veth0 up
ip -n test link set veth0 up

for i in $(seq 20 29); do
# assign addresses to both ends
ip addr add dev veth0 192.168.$i.1/24
ip -n test addr add dev veth0 192.168.$i.2/24

# start the traffic
netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
done

# wait
send_data: data send error: No route to host (errno 113)
netperf: send_omni: send_data failed: No route to host

We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".

While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.

Signed-off-by: Matteo Croce
Signed-off-by: David S. Miller

Matteo Croce
2017-10-15 02:05:26 +0800

07 Aug, 2017

1 commit

91ed1e666 ip/options: explicitly provide net ns to __ip_options_echo() ... Browse Code »

__ip_options_echo() uses the current network namespace, and
currently retrives it via skb->dst->dev.

This commit adds an explicit 'net' argument to __ip_options_echo()
and update all the call sites to provide it, usually via a simpler
sock_net().

After this change, __ip_options_echo() no more needs to access
skb->dst and we can drop a couple of hack to preserve such
info in the rx path.

Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2017-08-07 11:51:12 +0800

15 Jun, 2017

2 commits

0ddead90b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller

David S. Miller
2017-06-15 23:59:32 +0800
849a44de9 net: don't global ICMP rate limit packets originating from loopback ... Browse Code »

Florian Weimer seems to have a glibc test-case which requires that
loopback interfaces does not get ICMP ratelimited. This was broken by
commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that
gets rate limited").

An ICMP response will usually be routed back-out the same incoming
interface. Thus, take advantage of this and skip global ICMP
ratelimit when the incoming device is loopback. In the unlikely event
that the outgoing it not loopback, due to strange routing policy
rules, ICMP rate limiting still works via peer ratelimiting via
icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812
(section 4.3.2.8 "Rate Limiting").

This seems to fix the reproducer given by Florian. While still
avoiding to perform expensive and unneeded outgoing route lookup for
rate limited packets (in the non-loopback case).

Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited")
Reported-by: Florian Weimer
Reported-by: "H.J. Lu"
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2017-06-15 03:33:58 +0800

27 May, 2017

1 commit

3abd1ade6 net: ipv4: refactor __ip_route_output_key_hash ... Browse Code »

A later patch wants access to the fib result on an output route lookup
with the rcu lock held. Refactor __ip_route_output_key_hash, pushing
the logic between rcu_read_lock ... rcu_read_unlock into a new helper
with the fib_result as an input arg.

To keep the name length under control remove the leading underscores
from the name and add _rcu to the name of the new helper indicating it
is called with the rcu read lock held.

Signed-off-by: David Ahern
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

David Ahern
2017-05-27 02:12:49 +0800

22 Mar, 2017

1 commit

bf4e0a3db net: ipv4: add support for ECMP hash policy choice ... Browse Code »

This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
0 - layer 3 (default)
1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2017-03-22 06:27:19 +0800

10 Jan, 2017

1 commit

7ba91ecb1 net: for rate-limited ICMP replies save one atomic operation ... Browse Code »

It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.

The icmp_global_allow() call states it must be called with BH
disabled. This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock. Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.

Suggested-by: Eric Dumazet
Signed-off-by: Jesper Dangaard Brouer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2017-01-10 04:49:12 +0800