Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2018

40 commits

90964016e netfilter: nf_conntrack: add IPS_OFFLOAD status bit ... Browse Code »

This new bit tells us that the conntrack entry is owned by the flow
table offload infrastructure.

# cat /proc/net/nf_conntrack
ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2

Note the [OFFLOAD] tag in the listing.

The timer of such conntrack entries look like stopped from userspace.
In practise, to make sure the conntrack entry does not go away, the
conntrack timer is periodically set to an arbitrary large value that
gets refreshed on every iteration from the garbage collector, so it
never expires- and they display no internal state in the case of TCP
flows. This allows us to save a bitcheck from the packet path via
nf_ct_is_expired().

Conntrack entries that have been offloaded to the flow table
infrastructure cannot be deleted/flushed via ctnetlink. The flow table
infrastructure is also responsible for releasing this conntrack entry.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:11:05 +0800
0befd061a netfilter: nf_tables: remove nft_dereference() ... Browse Code »

This macro is unnecessary, it just hides details for one single caller.
nfnl_dereference() is just enough.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:11:05 +0800
a7f87b47e netfilter: remove defensive check on malformed packets from raw sockets ... Browse Code »

Users cannot forge malformed IPv4/IPv6 headers via raw sockets that they
can inject into the stack. Specifically, not for IPv4 since 55888dfb6ba7
("AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl
(v2)"). IPv6 raw sockets also ensure that packets have a well-formed
IPv6 header available in the skbuff.

At quick glance, br_netfilter also validates layer 3 headers and it
drops malformed both IPv4 and IPv6 packets.

Therefore, let's remove this defensive check all over the place.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:11:04 +0800
f6931f5f5 netfilter: meta: secpath support ... Browse Code »

replacement for iptables "-m policy --dir in --policy {ipsec,none}".

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:11:03 +0800
b3a61254d netfilter: remove struct nf_afinfo and its helper functions ... Browse Code »

This abstraction has no clients anymore, remove it.

This is what remains from previous authors, so correct copyright
statement after recent modifications and code removal.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:11:02 +0800
464356234 netfilter: remove route_key_size field in struct nf_afinfo ... Browse Code »

This is only needed by nf_queue, place this code where it belongs.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:11:01 +0800
ce388f452 netfilter: move reroute indirection to struct nf_ipv6_ops ... Browse Code »

We cannot make a direct call to nf_ip6_reroute() because that would result
in autoloading the 'ipv6' module because of symbol dependencies.
Therefore, define reroute indirection in nf_ipv6_ops where this really
belongs to.

For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:10:53 +0800
3f87c08c6 netfilter: move route indirection to struct nf_ipv6_ops ... Browse Code »

We cannot make a direct call to nf_ip6_route() because that would result
in autoloading the 'ipv6' module because of symbol dependencies.
Therefore, define route indirection in nf_ipv6_ops where this really
belongs to.

For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:26 +0800
7db9a51e0 netfilter: remove saveroute indirection in struct nf_afinfo ... Browse Code »

This is only used by nf_queue.c and this function comes with no symbol
dependencies with IPv6, it just refers to structure layouts. Therefore,
we can replace it by a direct function call from where it belongs.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:25 +0800
f7dcbe2f3 netfilter: move checksum_partial indirection to struct nf_ipv6_ops ... Browse Code »

We cannot make a direct call to nf_ip6_checksum_partial() because that
would result in autoloading the 'ipv6' module because of symbol
dependencies. Therefore, define checksum_partial indirection in
nf_ipv6_ops where this really belongs to.

For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:24 +0800
ef71fe27e netfilter: move checksum indirection to struct nf_ipv6_ops ... Browse Code »

We cannot make a direct call to nf_ip6_checksum() because that would
result in autoloading the 'ipv6' module because of symbol dependencies.
Therefore, define checksum indirection in nf_ipv6_ops where this really
belongs to.

For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:23 +0800
625c55611 netfilter: connlimit: split xt_connlimit into front and backend ... Browse Code »

This allows to reuse xt_connlimit infrastructure from nf_tables.
The upcoming nf_tables frontend can just pass in an nftables register
as input key, this allows limiting by any nft-supported key, including
concatenations.

For xt_connlimit, pass in the zone and the ip/ipv6 address.

With help from Yi-Hung Wei.

Signed-off-by: Florian Westphal
Acked-by: Yi-Hung Wei
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:22 +0800
c2f9eafee netfilter: nf_tables: remove hooks from family definition ... Browse Code »

They don't belong to the family definition, move them to the filter
chain type definition instead.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:22 +0800
c974a3a36 netfilter: nf_tables: remove multihook chains and families ... Browse Code »

Since NFPROTO_INET is handled from the core, we don't need to maintain
extra infrastructure in nf_tables to handle the double hook
registration, one for IPv4 and another for IPv6.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:21 +0800
12355d367 netfilter: nf_tables_inet: don't use multihook infrastructure anymore ... Browse Code »

Use new native NFPROTO_INET support in netfilter core, this gets rid of
ad-hoc code in the nf_tables API codebase.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:20 +0800
cb7ccd835 netfilter: core: support for NFPROTO_INET hook registration ... Browse Code »

Expand NFPROTO_INET in two hook registrations, one for NFPROTO_IPV4 and
another for NFPROTO_IPV6. Hence, we handle NFPROTO_INET from the core.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:19 +0800
302594081 netfilter: core: pass family as parameter to nf_remove_net_hook() ... Browse Code »

So static_key_slow_dec applies to the family behind NFPROTO_INET.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:19 +0800
62a0fe46e netfilter: core: pass hook number, family and device to nf_find_hook_list() ... Browse Code »

Instead of passing struct nf_hook_ops, this is needed by follow up
patches to handle NFPROTO_INET from the core.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:18 +0800
3d3cdc38e netfilter: core: add nf_remove_net_hook ... Browse Code »

Just a cleanup, __nf_unregister_net_hook() is used by a follow up patch
when handling NFPROTO_INET as a real family from the core.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:17 +0800
408070d6e netfilter: nf_tables: add nft_set_is_anonymous() helper ... Browse Code »

Add helper function to test for the NFT_SET_ANONYMOUS flag.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:16 +0800
7a4473a31 netfilter: nf_tables: explicit nft_set_pktinfo() call from hook path ... Browse Code »

Instead of calling this function from the family specific variant, this
reduces the code size in the fast path for the netdev, bridge and inet
families. After this change, we must call nft_set_pktinfo() upfront from
the chain hook indirection.

Before:

text data bss dec hex filename
2145 208 0 2353 931 net/netfilter/nf_tables_netdev.o

After:

text data bss dec hex filename
2125 208 0 2333 91d net/netfilter/nf_tables_netdev.o

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:15 +0800
fa45a7602 netfilter: nf_tables_arp: don't set forward chain ... Browse Code »

46928a0b49f3 ("netfilter: nf_tables: remove multihook chains and
families") already removed this, this is a leftover.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2018-01-09 01:01:15 +0800
84ba7dd71 netfilter: nf_tables: reject nat hook registration if prio is before conntrack ... Browse Code »

No problem for iptables as priorities are fixed values defined in the
nat modules, but in nftables the priority its coming from userspace.

Reject in case we see that such a hook would not work.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:14 +0800
f92b40a8b netfilter: core: only allow one nat hook per hook point ... Browse Code »

The netfilter NAT core cannot deal with more than one NAT hook per hook
location (prerouting, input ...), because the NAT hooks install a NAT null
binding in case the iptables nat table (iptable_nat hooks) or the
corresponding nftables chain (nft nat hooks) doesn't specify a nat
transformation.

Null bindings are needed to detect port collsisions between NAT-ed and
non-NAT-ed connections.

This causes nftables NAT rules to not work when iptable_nat module is
loaded, and vice versa because nat binding has already been attached
when the second nat hook is consulted.

The netfilter core is not really the correct location to handle this
(hooks are just hooks, the core has no notion of what kinds of side
effects a hook implements), but its the only place where we can check
for conflicts between both iptables hooks and nftables hooks without
adding dependencies.

So add nat annotation to hook_ops to describe those hooks that will
add NAT bindings and then make core reject if such a hook already exists.
The annotation fills a padding hole, in case further restrictions appar
we might change this to a 'u8 type' instead of bool.

iptables error if nft nat hook active:
iptables -t nat -A POSTROUTING -j MASQUERADE
iptables v1.4.21: can't initialize iptables table `nat': File exists
Perhaps iptables or your kernel needs to be upgraded.

nftables error if iptables nat table present:
nft -f /etc/nftables/ipv4-nat
/usr/etc/nftables/ipv4-nat:3:1-2: Error: Could not process rule: File exists
table nat {
^^

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:13 +0800
03d13b686 netfilter: xtables: add and use xt_request_find_table_lock ... Browse Code »

currently we always return -ENOENT to userspace if we can't find
a particular table, or if the table initialization fails.

Followup patch will make nat table init fail in case nftables already
registered a nat hook so this change makes xt_find_table_lock return
an ERR_PTR to return the errno value reported from the table init
function.

Add xt_request_find_table_lock as try_then_request_module replacement
and use it where needed.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:12 +0800
256d94ba3 netfilter: reduce NF_MAX_HOOKS define ... Browse Code »

This can be same as NF_INET_NUMHOOKS if we don't support DECNET.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:12 +0800
2a95183a5 netfilter: don't allocate space for arp/bridge hooks unless needed ... Browse Code »

no need to define hook points if the family isn't supported.
Because we need these hooks for either nftables, arp/ebtables
or the 'call-iptables' hack we have in the bridge layer add two
new dependencies, NETFILTER_FAMILY_{ARP,BRIDGE}, and have the
users select them.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:11 +0800
bb4badf3a netfilter: don't allocate space for decnet hooks unless needed ... Browse Code »

no need to define hook points if the family isn't supported.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:10 +0800
ef57170bb netfilter: reduce hook array sizes to what is needed ... Browse Code »

Not all families share the same hook count, adjust sizes to what is
needed.

struct net before:
/* size: 6592, cachelines: 103, members: 46 */
after:
/* size: 5952, cachelines: 93, members: 46 */

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:09 +0800
e58f33cc8 netfilter: add defines for arp/decnet max hooks ... Browse Code »

The kernel already has defines for this, but they are in uapi exposed
headers.

Including these from netns.h causes build errors and also adds unneeded
dependencies on heads that we don't need.

So move these defines to netfilter_defs.h and place the uapi ones
in ifndef __KERNEL__ to keep them for userspace.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:08 +0800
b0f38338a netfilter: reduce size of hook entry point locations ... Browse Code »

struct net contains:

struct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS];

which store the hook entry point locations for the various protocol
families and the hooks.

Using array results in compact c code when doing accesses, i.e.
x = rcu_dereference(net->nf.hooks[pf][hook]);

but its also wasting a lot of memory, as most families are
not used.

So split the array into those families that are used, which
are only 5 (instead of 13). In most cases, the 'pf' argument is
constant, i.e. gcc removes switch statement.

struct net before:
/* size: 5184, cachelines: 81, members: 46 */
after:
/* size: 4672, cachelines: 73, members: 46 */

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:08 +0800
8c873e219 netfilter: core: free hooks with call_rcu ... Browse Code »

Giuseppe Scrivano says:
"SELinux, if enabled, registers for each new network namespace 6
netfilter hooks."

Cost for this is high. With synchronize_net() removed:
"The net benefit on an SMP machine with two cores is that creating a
new network namespace takes -40% of the original time."

This patch replaces synchronize_net+kvfree with call_rcu().
We store rcu_head at the tail of a structure that has no fixed layout,
i.e. we cannot use offsetof() to compute the start of the original
allocation. Thus store this information right after the rcu head.

We could simplify this by just placing the rcu_head at the start
of struct nf_hook_entries. However, this structure is used in
packet processing hotpath, so only place what is needed for that
at the beginning of the struct.

Reported-by: Giuseppe Scrivano
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:07 +0800
26888dfd7 netfilter: core: remove synchronize_net call if nfqueue is used ... Browse Code »

since commit 960632ece6949b ("netfilter: convert hook list to an array")
nfqueue no longer stores a pointer to the hook that caused the packet
to be queued. Therefore no extra synchronize_net() call is needed after
dropping the packets enqueued by the old rule blob.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:06 +0800
4e645b47c netfilter: core: make nf_unregister_net_hooks simple wrapper again ... Browse Code »

This reverts commit d3ad2c17b4047
("netfilter: core: batch nf_unregister_net_hooks synchronize_net calls").

Nothing wrong with it. However, followup patch will delay freeing of hooks
with call_rcu, so all synchronize_net() calls become obsolete and there
is no need anymore for this batching.

This revert causes a temporary performance degradation when destroying
network namespace, but its resolved with the upcoming call_rcu conversion.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:05 +0800
ca9b01473 netfilter: nf_conntrack_h323: Remove unwanted comments. ... Browse Code »

Change old multi-line comment style to kernel comment style and
remove unwanted comments.

Signed-off-by: Varsha Rao
Signed-off-by: Pablo Neira Ayuso

Varsha Rao
2018-01-09 01:01:05 +0800
a778a15fa netfilter: ipset: add resched points during set listing ... Browse Code »

When sets are extremely large we can get softlockup during ipset -L.
We could fix this by adding cond_resched_rcu() at the right location
during iteration, but this only works if RCU nesting depth is 1.

At this time entire variant->list() is called under under rcu_read_lock_bh.
This used to be a read_lock_bh() but as rcu doesn't really lock anything,
it does not appear to be needed, so remove it (ipset increments set
reference count before this, so a set deletion should not be possible).

Reported-by: Li Shuang
Signed-off-by: Florian Westphal
Acked-by: Jozsef Kadlecsik
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:04 +0800
49971b885 netfilter: ipset: use nfnl_mutex_is_locked ... Browse Code »

Check that we really hold nfnl mutex here instead of relying on correct
usage alone.

Signed-off-by: Florian Westphal
Acked-by: Jozsef Kadlecsik
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:03 +0800
6b3d93300 netfilter: ipvs: Remove useless ipvsh param of frag_safe_skb_hp ... Browse Code »

The param of frag_safe_skb_hp, ipvsh, isn't used now. So remove it and
update the callers' codes too.

Signed-off-by: Gao Feng
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Gao Feng
2018-01-09 01:01:02 +0800
2c9e8637e netfilter: conntrack: timeouts can be const ... Browse Code »

Nowadays this is just the default template that is used when setting up
the net namespace, so nothing writes to these locations.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2018-01-09 01:01:02 +0800
e8542dcec netfilter: mark expected switch fall-throughs ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Gustavo A. R. Silva
2018-01-09 01:01:01 +0800