Eric Lee / smarc-fsl-linux-kernel

31 Mar, 2020

3 commits

d9679cd98 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Add support to specify a stateful expression in set definitions,
this allows users to specify e.g. counters per set elements.

2) Flowtable software counter support.

3) Flowtable hardware offload counter support, from wenxu.

3) Parallelize flowtable hardware offload requests, from Paul Blakey.
This includes a patch to add one work entry per offload command.

4) Several patches to rework nf_queue refcount handling, from Florian
Westphal.

4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling
information is missing and set up indirect flow block as TC_SETUP_FT,
patch from wenxu.

5) Stricter netlink attribute sanity check on filters, from Romain Bellan
and Florent Fourcot.

5) Annotations to make sparse happy, from Jules Irenge.

6) Improve icmp errors in debugging information, from Haishuang Yan.

7) Fix warning in IPVS icmp error debugging, from Haishuang Yan.

8) Fix endianess issue in tcp extension header, from Sergey Marinkevich.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-03-31 10:40:46 +0800
e19680f83 ipvs: fix uninitialized variable warning ... Browse Code »

If outer_proto is not set, GCC warning as following:

In file included from net/netfilter/ipvs/ip_vs_core.c:52:
net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp':
include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized]
233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \
| ^~~~~~
net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here
1666 | char *outer_proto;
| ^~~~~~~~~~~

Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors")
Signed-off-by: Haishuang Yan
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Haishuang Yan
2020-03-31 03:17:53 +0800
2e34328b3 netfilter: nft_exthdr: fix endianness of tcp option cast ... Browse Code »

I got a problem on MIPS with Big-Endian is turned on: every time when
NF trying to change TCP MSS it returns because of new.v16 was greater
than old.v16. But real MSS was 1460 and my rule was like this:

add rule table chain tcp option maxseg size set 1400

And 1400 is lesser that 1460, not greater.

Later I founded that main causer is cast from u32 to __be16.

Debugging:

In example MSS = 1400(HEX: 0x578). Here is representation of each byte
like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2
0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type.

LE BE
u32: [78 05 00 00] [00 00 05 78]

As you can see, u32 representation will be casted to u16 from different
half of 4-byte address range. But actually nf_tables uses registers and
store data of various size. Actually TCP MSS stored in 2 bytes. But
registers are still u32 in definition:

struct nft_regs {
union {
u32 data[20];
struct nft_verdict verdict;
};
};

So, access like regs->data[priv->sreg] exactly u32. So, according to
table presents above, per-byte representation of stored TCP MSS in
register will be:

LE BE
(u32)regs->data[]: [78 05 00 00] [05 78 00 00]
^^ ^^

We see that register uses just half of u32 and other 2 bytes may be
used for some another data. But in nft_exthdr_tcp_set_eval() it casted
just like u32 -> __be16:

new.v16 = src

But u32 overfill __be16, so it get 2 low bytes. For clarity draw
one more table( means that bytes will be used for cast).

LE BE
u32: [ 00 00] [00 00 ]
(u32)regs->data[]: [ 00 00] [05 78 ]

As you can see, for Little-Endian nothing changes, but for Big-endian we
take the wrong half. In my case there is some other data instead of
zeros, so new MSS was wrongly greater.

For shooting this bug I used solution for ports ranges. Applying of this
patch does not affect Little-Endian systems.

Signed-off-by: Sergey Marinkevich
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Sergey Marinkevich
2020-03-31 03:17:53 +0800

30 Mar, 2020

6 commits

ef803b3cf netfilter: flowtable: add counter support in HW offload ... Browse Code »

Store the conntrack counters to the conntrack entry in the
HW flowtable offload.

Signed-off-by: wenxu
Signed-off-by: Pablo Neira Ayuso

wenxu
2020-03-30 08:05:40 +0800
9312eabab netfilter: conntrack: add nf_ct_acct_add() ... Browse Code »

Add nf_ct_acct_add function to update the conntrack counter
with packets and bytes.

Signed-off-by: wenxu
Signed-off-by: Pablo Neira Ayuso

wenxu
2020-03-30 08:05:39 +0800
d56aab262 netfilter: nf_tables: skip set types that do not support for expressions ... Browse Code »

The bitmap set does not support for expressions, skip it from the
estimation step.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-30 08:05:39 +0800
8548bde98 netfilter: nft_dynset: validate set expression definition ... Browse Code »

If the global set expression definition mismatches the dynset
expression, then bail out.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-30 08:05:38 +0800
24791b9aa netfilter: nft_set_bitmap: initialize set element extension in lookups ... Browse Code »

Otherwise, nft_lookup might dereference an uninitialized pointer to the
element extension.

Fixes: 665153ff5752 ("netfilter: nf_tables: add bitmap set type")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-30 08:05:37 +0800
7c6b41216 netfilter: ctnetlink: be more strict when NF_CONNTRACK_MARK is not set ... Browse Code »

When CONFIG_NF_CONNTRACK_MARK is not set, any CTA_MARK or CTA_MARK_MASK
in netlink message are not supported. We should return an error when one
of them is set, not both

Fixes: 9306425b70bf ("netfilter: ctnetlink: must check mark attributes vs NULL")
Signed-off-by: Romain Bellan
Signed-off-by: Florent Fourcot
Signed-off-by: Pablo Neira Ayuso

Romain Bellan
2020-03-30 08:05:36 +0800

29 Mar, 2020

4 commits

28f715b9e netfilter: nf_queue: prefer nf_queue_entry_free ... Browse Code »

Instead of dropping refs+kfree, use the helper added in previous patch.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2020-03-29 22:28:29 +0800
af370ab36 netfilter: nf_queue: do not release refcouts until nf_reinject is done ... Browse Code »

nf_queue is problematic when another NF_QUEUE invocation happens
from nf_reinject().

1. nf_queue is invoked, increments state->sk refcount.
2. skb is queued, waiting for verdict.
3. sk is closed/released.
3. verdict comes back, nf_reinject is called.
4. nf_reinject drops the reference -- refcount can now drop to 0

Instead of get_ref/release_ref pattern, we need to nest the get_ref calls:
get_ref
get_ref
release_ref
release_ref

So that when we invoke the next processing stage (another netfilter
or the okfn()), we hold at least one reference count on the
devices/socket.

After previous patch, it is now safe to put the entry even after okfn()
has potentially free'd the skb.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2020-03-29 22:28:29 +0800
119e52e66 netfilter: nf_queue: place bridge physports into queue_entry struct ... Browse Code »

The refcount is done via entry->skb, which does work fine.
Major problem: When putting the refcount of the bridge ports, we
must always put the references while the skb is still around.

However, we will need to put the references after okfn() to avoid
a possible 1 -> 0 -> 1 refcount transition, so we cannot use the
skb pointer anymore.

Place the physports in the queue entry structure instead to allow
for refcounting changes in the next patch.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2020-03-29 22:28:29 +0800
dd3cc111f netfilter: nf_queue: make nf_queue_entry_release_refs static ... Browse Code »

This is a preparation patch, no logical changes.
Move free_entry into core and rename it to something more sensible.

Will ease followup patches which will complicate the refcount handling.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2020-03-29 22:28:29 +0800

28 Mar, 2020

10 commits

7da182a99 netfilter: flowtable: Use work entry per offload command ... Browse Code »

To allow offload commands to execute in parallel, create workqueue
for flow table offload, and use a work entry per offload command.

Signed-off-by: Paul Blakey
Reviewed-by: Oz Shlomo
Signed-off-by: Pablo Neira Ayuso

Paul Blakey
2020-03-28 01:42:21 +0800
422c032af netfilter: flowtable: Use rw sem as flow block lock ... Browse Code »

Currently flow offload threads are synchronized by the flow block mutex.
Use rw lock instead to increase flow insertion (read) concurrency.

Signed-off-by: Paul Blakey
Reviewed-by: Oz Shlomo
Signed-off-by: Pablo Neira Ayuso

Paul Blakey
2020-03-28 01:42:20 +0800
0a6a9515f netfilter: nf_tables: silence a RCU-list warning in nft_table_lookup() ... Browse Code »

It is safe to traverse &net->nft.tables with &net->nft.commit_mutex
held using list_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false
positive,

WARNING: suspicious RCU usage
net/netfilter/nf_tables_api.c:523 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by iptables/1384:
#0: ffffffff9745c4a8 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x25/0x60 [nf_tables]

Call Trace:
dump_stack+0xa1/0xea
lockdep_rcu_suspicious+0x103/0x10d
nft_table_lookup.part.0+0x116/0x120 [nf_tables]
nf_tables_newtable+0x12c/0x7d0 [nf_tables]
nfnetlink_rcv_batch+0x559/0x1190 [nfnetlink]
nfnetlink_rcv+0x1da/0x210 [nfnetlink]
netlink_unicast+0x306/0x460
netlink_sendmsg+0x44b/0x770
____sys_sendmsg+0x46b/0x4a0
___sys_sendmsg+0x138/0x1a0
__sys_sendmsg+0xb6/0x130
__x64_sys_sendmsg+0x48/0x50
do_syscall_64+0x69/0xf4
entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Qian Cai
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Qian Cai
2020-03-28 01:42:20 +0800
133a2fe59 netfilter: flowtable: Fix incorrect tc_setup_type type ... Browse Code »

The indirect block setup should use TC_SETUP_FT as the type instead of
TC_SETUP_BLOCK. Adjust existing users of the indirect flow block
infrastructure.

Fixes: b5140a36da78 ("netfilter: flowtable: add indr block setup support")
Signed-off-by: wenxu
Signed-off-by: Pablo Neira Ayuso

wenxu
2020-03-28 01:41:52 +0800
53c2b2899 netfilter: flowtable: add counter support ... Browse Code »

Add a new flag to turn on flowtable counters which are stored in the
conntrack entry.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-28 01:32:37 +0800
cfbd1125f netfilter: nf_tables: add enum nft_flowtable_flags to uapi ... Browse Code »

Expose the NFT_FLOWTABLE_HW_OFFLOAD flag through uapi.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-28 01:32:36 +0800
8ac2bd357 netfilter: conntrack: export nf_ct_acct_update() ... Browse Code »

This function allows you to update the conntrack counters.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-28 01:32:36 +0800
73348fed3 ipvs: optimize tunnel dumps for icmp errors ... Browse Code »

After strip GRE/UDP tunnel header for icmp errors, it's better to show
"GRE/UDP" instead of "IPIP" in debug message.

Signed-off-by: Haishuang Yan
Acked-by: Julian Anastasov
Signed-off-by: Pablo Neira Ayuso

Haishuang Yan
2020-03-28 01:31:01 +0800
6b36d4829 netfilter: conntrack: Add missing annotations for nf_conntrack_all_lock() and nf… ... Browse Code »

…_conntrack_all_unlock()

Sparse reports warnings at nf_conntrack_all_lock()
and nf_conntrack_all_unlock()

warning: context imbalance in nf_conntrack_all_lock()
- wrong count at exit
warning: context imbalance in nf_conntrack_all_unlock()
- unexpected unlock

Add the missing __acquires(&nf_conntrack_locks_all_lock)
Add missing __releases(&nf_conntrack_locks_all_lock)

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Jules Irenge
2020-03-28 01:22:06 +0800
19f8f717f netfilter: ctnetlink: Add missing annotation for ctnetlink_parse_nat_setup() ... Browse Code »

Sparse reports a warning at ctnetlink_parse_nat_setup()

warning: context imbalance in ctnetlink_parse_nat_setup()
- unexpected unlock

The root cause is the missing annotation at ctnetlink_parse_nat_setup()
Add the missing __must_hold(RCU) annotation

Signed-off-by: Jules Irenge
Signed-off-by: Pablo Neira Ayuso

Jules Irenge
2020-03-28 01:21:09 +0800

26 Mar, 2020

2 commits

9fb16955f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Overlapping header include additions in macsec.c

A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c

Overlapping test additions in selftests Makefile

Overlapping PCI ID table adjustments in iwlwifi driver.

Signed-off-by: David S. Miller

David S. Miller
2020-03-26 09:58:11 +0800
2c64605b5 net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build ... Browse Code »

net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
pkt->skb->tc_redirected = 1;
^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
pkt->skb->tc_from_ingress = 1;
^~

To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.

This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).

Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2020-03-26 03:24:33 +0800

25 Mar, 2020

6 commits

bcfabee1a netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress ... Browse Code »

Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet.
Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress
path after leaving the ifb egress path.

This patch inconditionally sets on these two skb fields that are
meaningful to the ifb driver. The existing forward action is guaranteed
to run from ingress path.

Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-25 02:59:39 +0800
76a109fac netfilter: nft_fwd_netdev: validate family and chain type ... Browse Code »

Make sure the forward action is only used from ingress.

Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-25 02:59:38 +0800
7c84d4141 netfilter: nft_set_rbtree: Detect partial overlaps on insertion ... Browse Code »

...and return -ENOTEMPTY to the front-end in this case, instead of
proceeding. Currently, nft takes care of checking for these cases
and not sending them to the kernel, but if we drop the set_overlap()
call in nft we can end up in situations like:

# nft add table t
# nft add set t s '{ type inet_service ; flags interval ; }'
# nft add element t s '{ 1 - 5 }'
# nft add element t s '{ 6 - 10 }'
# nft add element t s '{ 4 - 7 }'
# nft list set t s
table ip t {
set s {
type inet_service
flags interval
elements = { 1-3, 4-5, 6-7 }
}
}

This change has the primary purpose of making the behaviour
consistent with nft_set_pipapo, but is also functional to avoid
inconsistent behaviour if userspace sends overlapping elements for
any reason.

v2: When we meet the same key data in the tree, as start element while
inserting an end element, or as end element while inserting a start
element, actually check that the existing element is active, before
resetting the overlap flag (Pablo Neira Ayuso)

Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Stefano Brivio
2020-03-25 02:59:37 +0800
6f7c9caf0 netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() ... Browse Code »

Replace negations of nft_rbtree_interval_end() with a new helper,
nft_rbtree_interval_start(), wherever this helps to visualise the
problem at hand, that is, for all the occurrences except for the
comparison against given flags in __nft_rbtree_get().

This gets especially useful in the next patch.

Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Stefano Brivio
2020-03-25 02:59:30 +0800
0eb4b5ee3 netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion ... Browse Code »

...and return -ENOTEMPTY to the front-end on collision, -EEXIST if
an identical element already exists. Together with the previous patch,
element collision will now be returned to the user as -EEXIST.

Reported-by: Phil Sutter
Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Stefano Brivio
2020-03-25 02:59:08 +0800
8c2d45b2b netfilter: nf_tables: Allow set back-ends to report partial overlaps on insertion ... Browse Code »

Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it
might indicate that a given element (including intervals) already exists as
such, or that the new element would clash with existing ones.

If identical elements already exist, the front-end is ignoring this without
returning error, in case NLM_F_EXCL is not set. However, if the new element
can't be inserted due an overlap, we should report this to the user.

To this purpose, allow set back-ends to return -ENOTEMPTY on collision with
existing elements, translate that to -EEXIST, and return that to userspace,
no matter if NLM_F_EXCL was set.

Reported-by: Phil Sutter
Signed-off-by: Stefano Brivio
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-25 02:58:57 +0800

20 Mar, 2020

5 commits

15ff19723 netfilter: flowtable: populate addr_type mask ... Browse Code »

nf_flow_rule_match() sets control.addr_type in key, so needs to also set
the corresponding mask. An exact match is wanted, so mask is all ones.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Edward Cree
Signed-off-by: Pablo Neira Ayuso

Edward Cree
2020-03-20 04:20:04 +0800
dc264f1f7 netfilter: flowtable: fix NULL pointer dereference in tunnel offload support ... Browse Code »

The tc ct action does not cache the route in the flowtable entry.

Fixes: 88bf6e4114d5 ("netfilter: flowtable: add tunnel encap/decap action offload support")
Fixes: cfab6dbd0ecf ("netfilter: flowtable: add tunnel match offload support")
Signed-off-by: wenxu
Signed-off-by: Pablo Neira Ayuso

wenxu
2020-03-20 04:06:17 +0800
c921ffe85 netfilter: flowtable: Fix flushing of offloaded flows on free ... Browse Code »

Freeing a flowtable with offloaded flows, the flow are deleted from
hardware but are not deleted from the flow table, leaking them,
and leaving their offload bit on.

Add a second pass of the disabled gc to delete the these flows from
the flow table before freeing it.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey
Signed-off-by: Pablo Neira Ayuso

Paul Blakey
2020-03-20 04:05:30 +0800
41e9ec5a5 netfilter: flowtable: reload ip{v6}h in nf_flow_tuple_ip{v6} ... Browse Code »

Since pskb_may_pull may change skb->data, so we need to reload ip{v6}h at
the right place.

Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table")
Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
Signed-off-by: Haishuang Yan
Signed-off-by: Pablo Neira Ayuso

Haishuang Yan
2020-03-20 04:05:05 +0800
61abaf02d netfilter: flowtable: reload ip{v6}h in nf_flow_nat_ip{v6} ... Browse Code »

Since nf_flow_snat_port and nf_flow_snat_ip{v6} call pskb_may_pull()
which may change skb->data, so we need to reload ip{v6}h at the right
place.

Fixes: a908fdec3dda ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table")
Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
Signed-off-by: Haishuang Yan
Signed-off-by: Pablo Neira Ayuso

Haishuang Yan
2020-03-20 04:04:50 +0800

19 Mar, 2020

4 commits

475beb9c8 netfilter: nf_tables: add nft_set_elem_expr_destroy() and use it ... Browse Code »

This patch adds nft_set_elem_expr_destroy() to destroy stateful
expressions in set elements.

This patch also updates the commit path to call this function to invoke
expr->ops->destroy_clone when required.

This is implicitly fixing up a module reference counter leak and
a memory leak in expressions that allocated internal state, e.g.
nft_counter.

Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-19 18:37:32 +0800
772f4e82b netfilter: nf_tables: fix double-free on set expression from the error path ... Browse Code »

After copying the expression to the set element extension, release the
expression and reset the pointer to avoid a double-free from the error
path.

Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-19 18:37:31 +0800
65038428b netfilter: nf_tables: allow to specify stateful expression in set definition ... Browse Code »

This patch allows users to specify the stateful expression for the
elements in this set via NFTA_SET_EXPR. This new feature allows you to
turn on counters for all of the elements in this set.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-19 18:37:31 +0800
0c2a85edd netfilter: nf_tables: pass context to nft_set_destroy() ... Browse Code »

The patch that adds support for stateful expressions in set definitions
require this.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2020-03-19 18:37:31 +0800