Eric Lee / smarc-fsl-linux-kernel

13 Jan, 2021

1 commit

27bc60d96 netfilter: x_tables: Update remaining dereference to RCU ... Browse Code »

commit 443d6e86f821a165fae3fc3fc13086d27ac140b1 upstream.

This fixes the dereference to fetch the RCU pointer when holding
the appropriate xtables lock.

Reported-by: kernel test robot
Fixes: cc00bcaa5899 ("netfilter: x_tables: Switch synchronization to RCU")
Signed-off-by: Subash Abhinov Kasiviswanathan
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman

Subash Abhinov Kasiviswanathan
2021-01-13 03:18:25 +0800

08 Dec, 2020

1 commit

cc00bcaa5 netfilter: x_tables: Switch synchronization to RCU ... Browse Code »

When running concurrent iptables rules replacement with data, the per CPU
sequence count is checked after the assignment of the new information.
The sequence count is used to synchronize with the packet path without the
use of any explicit locking. If there are any packets in the packet path using
the table information, the sequence count is incremented to an odd value and
is incremented to an even after the packet process completion.

The new table value assignment is followed by a write memory barrier so every
CPU should see the latest value. If the packet path has started with the old
table information, the sequence counter will be odd and the iptables
replacement will wait till the sequence count is even prior to freeing the
old table info.

However, this assumes that the new table information assignment and the memory
barrier is actually executed prior to the counter check in the replacement
thread. If CPU decides to execute the assignment later as there is no user of
the table information prior to the sequence check, the packet path in another
CPU may use the old table information. The replacement thread would then free
the table information under it leading to a use after free in the packet
processing context-

Unable to handle kernel NULL pointer dereference at virtual
address 000000000000008e
pc : ip6t_do_table+0x5d0/0x89c
lr : ip6t_do_table+0x5b8/0x89c
ip6t_do_table+0x5d0/0x89c
ip6table_filter_hook+0x24/0x30
nf_hook_slow+0x84/0x120
ip6_input+0x74/0xe0
ip6_rcv_finish+0x7c/0x128
ipv6_rcv+0xac/0xe4
__netif_receive_skb+0x84/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c

This could be fixed by forcing instruction order after the new table
information assignment or by switching to RCU for the synchronization.

Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
Reported-by: Sean Tranchetti
Reported-by: kernel test robot
Suggested-by: Florian Westphal
Signed-off-by: Subash Abhinov Kasiviswanathan
Signed-off-by: Pablo Neira Ayuso

Subash Abhinov Kasiviswanathan
2020-12-08 19:57:39 +0800

20 Nov, 2020

1 commit

2d8f6481c ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module ... Browse Code »

IPV6=m
NF_DEFRAG_IPV6=y

ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function
`nf_ct_frag6_gather':
net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to
`ipv6_frag_thdr_truncated'

Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This
dependency is forcing IPV6=y.

Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This
is the same solution as used with a similar issues: Referring to
commit 70b095c843266 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6
module")

Fixes: 9d9e937b1c8b ("ipv6/netfilter: Discard first fragment not including all headers")
Reported-by: Randy Dunlap
Reported-by: kernel test robot
Signed-off-by: Georg Kohmann
Acked-by: Pablo Neira Ayuso
Acked-by: Randy Dunlap # build-tested
Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski

Georg Kohmann
2020-11-20 02:49:50 +0800

17 Nov, 2020

1 commit

9d9e937b1 ipv6/netfilter: Discard first fragment not including all headers ... Browse Code »

Packets are processed even though the first fragment don't include all
headers through the upper layer header. This breaks TAHI IPv6 Core
Conformance Test v6LC.1.3.6.

Referring to RFC8200 SECTION 4.5: "If the first fragment does not include
all headers through an Upper-Layer header, then that fragment should be
discarded and an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero."

The fragment needs to be validated the same way it is done in
commit 2efdaaaf883a ("IPv6: reply ICMP error if the first fragment don't
include all headers") for ipv6. Wrap the validation into a common function,
ipv6_frag_thdr_truncated() to check for truncation in the upper layer
header. This validation does not fullfill all aspects of RFC 8200,
section 4.5, but is at the moment sufficient to pass mentioned TAHI test.

In netfilter, utilize the fragment offset returned by find_prev_fhdr() to
let ipv6_frag_thdr_truncated() start it's traverse from the fragment
header.

Return 0 to drop the fragment in the netfilter. This is the same behaviour
as used on other protocol errors in this function, e.g. when
nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up
by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an
appropriate ICMP Parameter Problem message back to the source.

References commit 2efdaaaf883a ("IPv6: reply ICMP error if the first
fragment don't include all headers")

Signed-off-by: Georg Kohmann
Acked-by: Pablo Neira Ayuso
Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski

Georg Kohmann
2020-11-17 02:15:11 +0800

30 Oct, 2020

1 commit

46d6c5ae9 netfilter: use actual socket sk rather than skb sk when routing harder ... Browse Code »

If netfilter changes the packet mark when mangling, the packet is
rerouted using the route_me_harder set of functions. Prior to this
commit, there's one big difference between route_me_harder and the
ordinary initial routing functions, described in the comment above
__ip_queue_xmit():

/* Note: skb->sk can be different from sk, in case of tunnels */
int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,

That function goes on to correctly make use of sk->sk_bound_dev_if,
rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
It will make some transformations to that packet, and then it will send
the encapsulated packet out of a *new* socket. That new socket will
basically always have a different sk_bound_dev_if (otherwise there'd be
a routing loop). So for the purposes of routing the encapsulated packet,
the routing information as it pertains to the socket should come from
that socket's sk, rather than the packet's original skb->sk. For that
reason __ip_queue_xmit() and related functions all do the right thing.

One might argue that all tunnels should just call skb_orphan(skb) before
transmitting the encapsulated packet into the new socket. But tunnels do
*not* do this -- and this is wisely avoided in skb_scrub_packet() too --
because features like TSQ rely on skb->destructor() being called when
that buffer space is truely available again. Calling skb_orphan(skb) too
early would result in buffers filling up unnecessarily and accounting
info being all wrong. Instead, additional routing must take into account
the new sk, just as __ip_queue_xmit() notes.

So, this commit addresses the problem by fishing the correct sk out of
state->sk -- it's already set properly in the call to nf_hook() in
__ip_local_out(), which receives the sk as part of its normal
functionality. So we make sure to plumb state->sk through the various
route_me_harder functions, and then make correct use of it following the
example of __ip_queue_xmit().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Jason A. Donenfeld
2020-10-30 19:57:39 +0800

20 Oct, 2020

1 commit

68f9f9c2c netfilter: Drop fragmented ndisc packets assembled in netfilter ... Browse Code »

Fragmented ndisc packets assembled in netfilter not dropped as specified
in RFC 6980, section 5. This behaviour breaks TAHI IPv6 Core Conformance
Tests v6LC.2.1.22/23, V6LC.2.2.26/27 and V6LC.2.3.18.

Setting IP6SKB_FRAGMENTED flag during reassembly.

References: commit b800c3b966bc ("ipv6: drop fragmented ndisc packets by default (RFC 6980)")
Signed-off-by: Georg Kohmann
Signed-off-by: Pablo Neira Ayuso

Georg Kohmann
2020-10-20 19:54:53 +0800

16 Oct, 2020

1 commit

2295cddf9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.

In both cases code was added on both sides in the same place
so just keep both.

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2020-10-16 03:43:21 +0800

14 Oct, 2020

1 commit

0d9826bc1 netfilter: nf_log: missing vlan offload tag and proto ... Browse Code »

Dump vlan tag and proto for the usual vlan offload case if the
NF_LOG_MACDECODE flag is set on. Without this information the logging is
misleading as there is no reference to the VLAN header.

[12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
[12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1

Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-10-14 07:25:14 +0800

29 Aug, 2020

1 commit

d5608a057 netfilter: ip6t_NPT: rewrite addresses in ICMPv6 original packet ... Browse Code »

Detect and rewrite a prefix embedded in an ICMPv6 original packet that was
rewritten by a corresponding DNPT/SNPT rule so it will be recognised by
the host that sent the original packet.

Example

Rules in effect on the 1:2:3:4::/64 + 5:6:7:8::/64 side router:
* SNPT src-pfx 1:2:3:4::/64 dst-pfx 5:6:7:8::/64
* DNPT src-pfx 5:6:7:8::/64 dst-pfx 1:2:3:4::/64

No rules on the 9:a:b:c::/64 side.

1. 1:2:3:4::1 sends UDP packet to 9:a:b:c::1
2. Router applies SNPT changing src to 5:6:7:8::ffef::1
3. 9:a:b:c::1 receives packet with (src 5:6:7:8::ffef::1 dst 9:a:b:c::1)
and replies with ICMPv6 port unreachable to 5:6:7:8::ffef::1,
including original packet (src 5:6:7:8::ffef::1 dst 9:a:b:c::1)
4. Router forwards ICMPv6 packet with (src 9:a:b:c::1 dst 5:6:7:8::ffef::1)
including original packet (src 5:6:7:8::ffef::1 dst 9:a:b:c::1)
and applies DNPT changing dst to 1:2:3:4::1
5. 1:2:3:4::1 receives ICMPv6 packet with (src 9:a:b:c::1 dst 1:2:3:4::1)
including original packet (src 5:6:7:8::ffef::1 dst 9:a:b:c::1).
It doesn't recognise the original packet as the src doesn't
match anything it originally sent

With this change, at step 4, DNPT will also rewrite the original packet
src to 1:2:3:4::1, so at step 5, 1:2:3:4::1 will recognise the ICMPv6
error and provide feedback to the application properly.

Conversely, SNPT will help when ICMPv6 errors are sent from the
translated network.

1. 9:a:b:c::1 sends UDP packet to 5:6:7:8::ffef::1
2. Router applies DNPT changing dst to 1:2:3:4::1
3. 1:2:3:4::1 receives packet with (src 9:a:b:c::1 dst 1:2:3:4::1)
and replies with ICMPv6 port unreachable to 9:a:b:c::1
including original packet (src 9:a:b:c::1 dst 1:2:3:4::1)
4. Router forwards ICMPv6 packet with (src 1:2:3:4::1 dst 9:a:b:c::1)
including original packet (src 9:a:b:c::1 dst 1:2:3:4::1)
and applies SNPT changing src to 5:6:7:8::ffef::1
5. 9:a:b:c::1 receives ICMPv6 packet with
(src 5:6:7:8::ffef::1 dst 9:a:b:c::1) including
original packet (src 9:a:b:c::1 dst 1:2:3:4::1).
It doesn't recognise the original packet as the dst doesn't
match anything it already sent

The change to SNPT means the ICMPv6 original packet dst will be
rewritten to 5:6:7:8::ffef::1 in step 4, allowing the error to be
properly recognised in step 5.

Signed-off-by: Michael Zhou
Signed-off-by: Pablo Neira Ayuso

Michael Zhou
2020-08-29 01:18:48 +0800

06 Aug, 2020

1 commit

47ec5303d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

2) Support UDP segmentation in code TSO code, from Eric Dumazet.

3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.

4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.

5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.

7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.

8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.

9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...

Linus Torvalds
2020-08-06 11:13:21 +0800

04 Aug, 2020

1 commit

f2e0b29a9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) UAF in chain binding support from previous batch, from Dan Carpenter.

2) Queue up delayed work to expire connections with no destination,
from Andrew Sy Kim.

3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva.

4) Replace HTTP links with HTTPS, from Alexander A. Klimov.

5) Remove superfluous null header checks in ip6tables, from
Gaurav Singh.

6) Add extended netlink error reporting for expression.

7) Report EEXIST on overlapping chain, set elements and flowtable
devices.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-08-04 07:03:18 +0800

30 Jul, 2020

1 commit

42f36eba7 netfilter: ip6tables: Remove redundant null checks ... Browse Code »

Remove superfluous check for NULL pointer to header.

Signed-off-by: Gaurav Singh
Signed-off-by: Pablo Neira Ayuso

Gaurav Singh
2020-07-30 02:39:43 +0800

29 Jul, 2020

1 commit

d3c481515 net: remove sockptr_advance ... Browse Code »

sockptr_advance never properly worked. Replace it with _offset variants
of copy_from_sockptr and copy_to_sockptr.

Fixes: ba423fdaa589 ("net: add a new sockptr_t type")
Reported-by: Jason A. Donenfeld
Reported-by: Ido Schimmel
Signed-off-by: Christoph Hellwig
Acked-by: Jason A. Donenfeld
Tested-by: Ido Schimmel
Signed-off-by: David S. Miller

Christoph Hellwig
2020-07-29 04:43:40 +0800

25 Jul, 2020

2 commits

c2f12630c netfilter: switch nf_setsockopt to sockptr_t ... Browse Code »

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2020-07-25 06:41:54 +0800
ab214d1bf netfilter: switch xt_copy_counters to sockptr_t ... Browse Code »

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2020-07-25 06:41:53 +0800

20 Jul, 2020

2 commits

c34bc10d2 netfilter: remove the compat argument to xt_copy_counters_from_user ... Browse Code »

Lift the in_compat_syscall() from the callers instead.

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2020-07-20 09:16:40 +0800
f415e76fd netfilter/ip6_tables: clean up compat {get, set}sockopt handling ... Browse Code »

Merge the native and compat {get,set}sockopt handlers using
in_compat_syscall().

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2020-07-20 09:16:40 +0800

17 Jul, 2020

1 commit

3f649ab72 treewide: Remove uninitialized_var() usage ... Browse Code »

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var$([^$]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe # IB
Acked-by: Kalle Valo # wireless drivers
Reviewed-by: Chao Yu # erofs
Signed-off-by: Kees Cook

Kees Cook
2020-07-17 03:35:15 +0800

01 Jul, 2020

1 commit

f53b9b0bd netfilter: introduce support for reject at prerouting stage ... Browse Code »

REJECT statement can be only used in INPUT, FORWARD and OUTPUT
chains. This patch adds support of REJECT, both icmp and tcp
reset, at PREROUTING stage.

The need for this patch comes from the requirement of some
forwarding devices to reject traffic before the natting and
routing decisions.

The main use case is to be able to send a graceful termination
to legitimate clients that, under any circumstances, the NATed
endpoints are not available. This option allows clients to
decide either to perform a reconnection or manage the error in
their side, instead of just dropping the connection and let
them die due to timeout.

It is supported ipv4, ipv6 and inet families for nft
infrastructure.

Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso

Laura Garcia Liebana
2020-07-01 00:21:02 +0800

25 Jun, 2020

3 commits

5f027bc74 netfilter: ip6tables: Add a .pre_exit hook in all ip6table_foo.c. ... Browse Code »

Using new helpers ip6t_unregister_table_pre_exit() and
ip6t_unregister_table_exit().

Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder
Signed-off-by: Pablo Neira Ayuso

David Wilder
2020-06-25 06:50:31 +0800
57ea5f188 netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit helpers. ... Browse Code »

The pre_exit will un-register the underlying hook and .exit will do
the table freeing. The netns core does an unconditional synchronize_rcu
after the pre_exit hooks insuring no packets are in flight that have
picked up the pointer before completing the un-register.

Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder
Signed-off-by: Pablo Neira Ayuso

David Wilder
2020-06-25 06:50:31 +0800
4cacc3951 netfilter: Add MODULE_DESCRIPTION entries to kernel modules ... Browse Code »

The user tool modinfo is used to get information on kernel modules, including a
description where it is available.

This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules
(descriptions taken from Kconfig file or code comments)

Signed-off-by: Rob Gill
Signed-off-by: Pablo Neira Ayuso

Rob Gill
2020-06-25 06:50:31 +0800

14 Jun, 2020

1 commit

a7f7f6248 treewide: replace '---help---' in Kconfig files with 'help' ... Browse Code »

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2020-06-14 00:57:21 +0800

15 Mar, 2020

1 commit

6daf14140 netfilter: Replace zero-length array with flexible-array member ... Browse Code »

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

Lastly, fix checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
in net/bridge/netfilter/ebtables.c

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Pablo Neira Ayuso

Gustavo A. R. Silva
2020-03-15 22:20:16 +0800

13 Mar, 2020

1 commit

a8eceea84 inet: Use fallthrough; ... Browse Code »

Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

And by hand:

net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
that causes gcc to emit a warning if converted in-place.

So move the new fallthrough; inside the containing #ifdef/#endif too.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2020-03-13 06:55:00 +0800

22 Nov, 2019

1 commit

43da14110 net: Fix Kconfig indentation, continued ... Browse Code »

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style. This fixes various indentation mixups (seven spaces,
tab+one space, etc).

Signed-off-by: Krzysztof Kozlowski
Signed-off-by: David S. Miller

Krzysztof Kozlowski
2019-11-22 04:00:21 +0800

16 Nov, 2019

1 commit

5c27d8d76 netfilter: nf_flow_table_offload: add IPv6 support ... Browse Code »

Add nf_flow_rule_route_ipv6() and use it from the IPv6 and the inet
flowtable type definitions. Rename the nf_flow_rule_route() function to
nf_flow_rule_route_ipv4().

Adjust maximum number of actions, which now becomes 16 to leave
sufficient room for the IPv6 address mangling for NAT.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2019-11-16 06:44:47 +0800

13 Nov, 2019

2 commits

c29f74e0d netfilter: nf_flow_table: hardware offload support ... Browse Code »

This patch adds the dataplane hardware offload to the flowtable
infrastructure. Three new flags represent the hardware state of this
flow:

* FLOW_OFFLOAD_HW: This flow entry resides in the hardware.
* FLOW_OFFLOAD_HW_DYING: This flow entry has been scheduled to be remove
from hardware. This might be triggered by either packet path (via TCP
RST/FIN packet) or via aging.
* FLOW_OFFLOAD_HW_DEAD: This flow entry has been already removed from
the hardware, the software garbage collector can remove it from the
software flowtable.

This patch supports for:

* IPv4 only.
* Aging via FLOW_CLS_STATS, no packet and byte counter synchronization
at this stage.

This patch also adds the action callback that specifies how to convert
the flow entry into the flow_rule object that is passed to the driver.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-11-13 11:42:26 +0800
8bb69f3b2 netfilter: nf_tables: add flowtable offload control plane ... Browse Code »

This patch adds the NFTA_FLOWTABLE_FLAGS attribute that allows users to
specify the NF_FLOWTABLE_HW_OFFLOAD flag. This patch also adds a new
setup interface for the flowtable type to perform the flowtable offload
block callback configuration.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-11-13 11:42:26 +0800

17 Oct, 2019

1 commit

0a9b33850 netfilter: nft_tproxy: Fix typo in IPv6 module description. ... Browse Code »

Signed-off-by: Norman Rasmussen
Signed-off-by: Pablo Neira Ayuso

Norman Rasmussen
2019-10-17 18:21:11 +0800

02 Oct, 2019

1 commit

895b5c9f2 netfilter: drop bridge nf reset from nf_reset ... Browse Code »

commit 174e23810cd31
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions. The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.

Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.

This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().

In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.

I am submitting this for "net", because we're still early in the release
cycle. The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.

Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2019-10-02 00:42:15 +0800

26 Sep, 2019

1 commit

bf69abad2 net: Fix Kconfig indentation ... Browse Code »

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski
Acked-by: Sven Eckelmann
Signed-off-by: David S. Miller

Krzysztof Kozlowski
2019-09-26 14:56:17 +0800

13 Sep, 2019

2 commits

44dde2369 netfilter: move inline nf_ip6_ext_hdr() function to a more appropriate header. ... Browse Code »

There is an inline function in ip6_tables.h which is not specific to
ip6tables and is used elswhere in netfilter. Move it into
netfilter_ipv6.h and update the callers.

Signed-off-by: Jeremy Sowden
Signed-off-by: Pablo Neira Ayuso

Jeremy Sowden
2019-09-13 18:34:09 +0800
40d102cde netfilter: update include directives. ... Browse Code »

Include some headers in files which require them, and remove others
which are not required.

Signed-off-by: Jeremy Sowden
Signed-off-by: Pablo Neira Ayuso

Jeremy Sowden
2019-09-13 18:33:06 +0800

20 Aug, 2019

1 commit

446bf64b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Merge conflict of mlx5 resolved using instructions in merge
commit 9566e650bf7fdf58384bb06df634f7531ca3a97e.

Signed-off-by: David S. Miller

David S. Miller
2019-08-20 02:54:03 +0800

09 Aug, 2019

1 commit

891584f48 inet: frags: re-introduce skb coalescing for local delivery ... Browse Code »

Before commit d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).

By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e33462 ("Revert
"ipv4: use skb coalescing in defragmentation"")).

Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.

When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.

Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.

This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.

[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1

Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6

Signed-off-by: Guillaume Nault
Acked-by: Florian Westphal
Signed-off-by: David S. Miller

Guillaume Nault
2019-08-09 06:55:10 +0800

04 Aug, 2019

1 commit

8c0bb7873 netfilter: synproxy: rename mss synproxy_options field ... Browse Code »

After introduce "mss_encode" field in the synproxy_options struct the field
"mss" is a little confusing. It has been renamed to "mss_option".

Signed-off-by: Fernando Fernandez Mancera
Signed-off-by: Pablo Neira Ayuso

Fernando Fernandez Mancera
2019-08-04 00:39:08 +0800

16 Jul, 2019

2 commits

b83329fb4 netfilter: synproxy: fix erroneous tcp mss option ... Browse Code »

Now synproxy sends the mss value set by the user on client syn-ack packet
instead of the mss value that client announced.

Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target")
Signed-off-by: Fernando Fernandez Mancera
Signed-off-by: Pablo Neira Ayuso

Fernando Fernandez Mancera
2019-07-16 19:17:01 +0800
b575b24b8 netfilter: Fix rpfilter dropping vrf packets by mistake ... Browse Code »

When firewalld is enabled with ipv4/ipv6 rpfilter, vrf
ipv4/ipv6 packets will be dropped. Vrf device will pass
through netfilter hook twice. One with enslaved device
and another one with l3 master device. So in device may
dismatch witch out device because out device is always
enslaved device.So failed with the check of the rpfilter
and drop the packets by mistake.

Signed-off-by: Miaohe Lin
Signed-off-by: Pablo Neira Ayuso

Miaohe Lin
2019-07-16 19:16:47 +0800

12 Jul, 2019

1 commit

416e8126a ipv6: Use ipv6_authlen for len ... Browse Code »

The length of AH header is computed manually as (hp->hdrlen+2)<
Signed-off-by: David S. Miller

yangxingwu
2019-07-12 05:43:25 +0800