29 Aug, 2020
4 commits
-
We can delay refcount increment until we reassign the existing entry to
the current skb.A 0 refcount can't happen while the nf_conn object is still in the
hash table and parallel mutations are impossible because we hold the
bucket lock.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
There is a misconception about what "insert_failed" means.
We increment this even when a clash got resolved, so it might not indicate
a problem.Add a dedicated counter for clash resolution and only increment
insert_failed if a clash cannot be resolved.For the old /proc interface, export this in place of an older stat
that got removed a while back.
For ctnetlink, export this with a new attribute.Also correct an outdated comment that implies we add a duplicate tuple --
we only add the (unique) reply direction.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
This counter increments when nf_conntrack_in sees a packet that already
has a conntrack attached or when the packet is marked as UNTRACKED.
Neither is an error.The former is normal for loopback traffic. The second happens for
certain ICMPv6 packets or when nftables/ip(6)tables rules are in place.In case someone needs to count UNTRACKED packets, or packets
that are marked as untracked before conntrack_in this can be done with
both nftables and ip(6)tables rules.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
The /proc interface for nf_conntrack displays the "error" counter as
"icmp_error".It makes sense to not increment "invalid" when failing to handle an icmp
packet since those are special.For example, its possible for conntrack to see partial and/or fragmented
packets inside icmp errors. This should be a separate event and not get
mixed with the "invalid" counter.Likewise, remove the "error" increment for errors from get_l4proto().
After this, the error counter will only increment for errors coming from
icmp(v6) packet handling.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
11 Aug, 2020
1 commit
-
Pull locking updates from Thomas Gleixner:
"A set of locking fixes and updates:- Untangle the header spaghetti which causes build failures in
various situations caused by the lockdep additions to seqcount to
validate that the write side critical sections are non-preemptible.- The seqcount associated lock debug addons which were blocked by the
above fallout.seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict
per CPU seqcounts. As the lock is not part of the seqcount, lockdep
cannot validate that the lock is held.This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored
and write_seqcount_begin() has a lockdep assertion to validate that
the lock is held.Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API
is unchanged and determines the type at compile time with the help
of _Generic which is possible now that the minimal GCC version has
been moved up.Adding this lockdep coverage unearthed a handful of seqcount bugs
which have been addressed already independent of this.While generally useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if
the writers are serialized by an associated lock, which leads to
the well known reader preempts writer livelock. RT prevents this by
storing the associated lock pointer independent of lockdep in the
seqcount and changing the reader side to block on the lock when a
reader detects that a writer is in the write side critical section.- Conversion of seqcount usage sites to associated types and
initializers"* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/seqlock, headers: Untangle the spaghetti monster
locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
x86/headers: Remove APIC headers from
seqcount: More consistent seqprop names
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
seqlock: Fold seqcount_LOCKNAME_init() definition
seqlock: Fold seqcount_LOCKNAME_t definition
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
hrtimer: Use sequence counter with associated raw spinlock
kvm/eventfd: Use sequence counter with associated spinlock
userfaultfd: Use sequence counter with associated spinlock
NFSv4: Use sequence counter with associated spinlock
iocost: Use sequence counter with associated spinlock
raid5: Use sequence counter with associated spinlock
vfs: Use sequence counter with associated spinlock
timekeeping: Use sequence counter with associated raw spinlock
xfrm: policy: Use sequence counters with associated lock
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
netfilter: conntrack: Use sequence counter with associated spinlock
sched: tasks: Use sequence counter with associated spinlock
...
05 Aug, 2020
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter fixes for netThe following patchset contains Netfilter fixes for net:
1) Flush the cleanup xtables worker to make sure destructors
have completed, from Florian Westphal.2) iifgroup is matching erroneously, also from Florian.
3) Add selftest for meta interface matching, from Florian Westphal.
4) Move nf_ct_offload_timeout() to header, from Roi Dayan.
5) Call nf_ct_offload_timeout() from flow_offload_add() to
make sure garbage collection does not evict offloaded flow,
from Roi Dayan.
====================Signed-off-by: David S. Miller
03 Aug, 2020
1 commit
-
To be used by callers from other modules.
[ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution
issue --Pablo ]Signed-off-by: Roi Dayan
Reviewed-by: Oz Shlomo
Signed-off-by: Pablo Neira Ayuso
29 Jul, 2020
1 commit
-
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.Signed-off-by: Ahmed S. Darwish
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200720155530.1173732-15-a.darwish@linutronix.de
14 Jul, 2020
1 commit
-
Simple fixes which require no deep knowledge of the code.
Cc: Pablo Neira Ayuso
Cc: Jozsef Kadlecsik
Cc: Florian Westphal
Signed-off-by: Andrew Lunn
Signed-off-by: David S. Miller
03 Jul, 2020
1 commit
-
__nf_conntrack_update() might refresh the conntrack object that is
attached to the skbuff. Otherwise, this triggers UAF.[ 633.200434] ==================================================================
[ 633.200472] BUG: KASAN: use-after-free in nf_conntrack_update+0x34e/0x770 [nf_conntrack]
[ 633.200478] Read of size 1 at addr ffff888370804c00 by task nfqnl_test/6769[ 633.200487] CPU: 1 PID: 6769 Comm: nfqnl_test Not tainted 5.8.0-rc2+ #388
[ 633.200490] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
[ 633.200491] Call Trace:
[ 633.200499] dump_stack+0x7c/0xb0
[ 633.200526] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
[ 633.200532] print_address_description.constprop.6+0x1a/0x200
[ 633.200539] ? _raw_write_lock_irqsave+0xc0/0xc0
[ 633.200568] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
[ 633.200594] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
[ 633.200598] kasan_report.cold.9+0x1f/0x42
[ 633.200604] ? call_rcu+0x2c0/0x390
[ 633.200633] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
[ 633.200659] nf_conntrack_update+0x34e/0x770 [nf_conntrack]
[ 633.200687] ? nf_conntrack_find_get+0x30/0x30 [nf_conntrack]Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1436
Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again")
Signed-off-by: Pablo Neira Ayuso
02 Jun, 2020
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for net-next
to extend ctnetlink and the flowtable infrastructure:1) Extend ctnetlink kernel side netlink dump filtering capabilities,
from Romain Bellan.2) Generalise the flowtable hook parser to take a hook list.
3) Pass a hook list to the flowtable hook registration/unregistration.
4) Add a helper function to release the flowtable hook list.
5) Update the flowtable event notifier to pass a flowtable hook list.
6) Allow users to add new devices to an existing flowtables.
7) Allow users to remove devices to an existing flowtables.
8) Allow for registering a flowtable with no initial devices.
====================Signed-off-by: David S. Miller
28 May, 2020
1 commit
-
Conntrack dump does not support kernel side filtering (only get exists,
but it returns only one entry. And user has to give a full valid tuple)It means that userspace has to implement filtering after receiving many
irrelevant entries, consuming resources (conntrack table is sometimes
very huge, much more than a routing table for example).This patch adds filtering in kernel side. To achieve this goal, we:
* Add a new CTA_FILTER netlink attributes, actually a flag list to
parametize filtering
* Convert some *nlattr_to_tuple() functions, to allow a partial parsing
of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not
fully set)Filtering is now possible on:
* IP SRC/DST values
* Ports for TCP and UDP flows
* IMCP(v6) codes types and IDsFiltering is done as an "AND" operator. For example, when flags
PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all
values are dumped.Changes since v1:
Set NLM_F_DUMP_FILTERED in nlm flags if entries are filteredChanges since v2:
Move several constants to nf_internals.h
Move a fix on netlink values check in a separate patch
Add a check on not-supported flags
Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack
(not yet implemented)
Code style issuesChanges since v3:
Fix compilation warning reported by kbuild test robotChanges since v4:
Fix a regression introduced in v3 (returned EINVAL for valid netlink
messages without CTA_MARK)Changes since v5:
Change definition of CTA_FILTER_F_ALL
Fix a regression when CTA_TUPLE_ZONE is not setSigned-off-by: Romain Bellan
Signed-off-by: Florent Fourcot
Signed-off-by: Pablo Neira Ayuso
27 May, 2020
2 commits
-
net/netfilter/nf_conntrack_core.c: In function nf_confirm_cthelper:
net/netfilter/nf_conntrack_core.c:2117:15: warning: comparison of unsigned expression in < 0 is always false [-Wtype-limits]
2117 | if (protoff < 0 || (frag_off & htons(~0x7)) != 0)
| ^ipv6_skip_exthdr() returns a signed integer.
Reported-by: Colin Ian King
Fixes: 703acd70f249 ("netfilter: nfnetlink_cthelper: unbreak userspace helper support")
Signed-off-by: Pablo Neira Ayuso -
Clang warns:
net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is
uninitialized when used here [-Wuninitialized]
nf_ct_set(skb, ct, ctinfo);
^~~~~~
net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is
declared here
enum ip_conntrack_info ctinfo;
^
1 warning generated.nf_conntrack_update was split up into nf_conntrack_update and
__nf_conntrack_update, where the assignment of ctinfo is in
nf_conntrack_update but it is used in __nf_conntrack_update.Pass the value of ctinfo from nf_conntrack_update to
__nf_conntrack_update so that uninitialized memory is not used
and everything works properly.Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again")
Link: https://github.com/ClangBuiltLinux/linux/issues/1039
Signed-off-by: Nathan Chancellor
Signed-off-by: Pablo Neira Ayuso
26 May, 2020
1 commit
-
Florian Westphal says:
"Problem is that after the helper hook was merged back into the confirm
one, the queueing itself occurs from the confirm hook, i.e. we queue
from the last netfilter callback in the hook-list.Therefore, on return, the packet bypasses the confirm action and the
connection is never committed to the main conntrack table.To fix this there are several ways:
1. revert the 'Fixes' commit and have a extra helper hook again.
Works, but has the drawback of adding another indirect call for
everyone.2. Special case this: split the hooks only when userspace helper
gets added, so queueing occurs at a lower priority again,
and normal enqueue reinject would eventually call the last hook.3. Extend the existing nf_queue ct update hook to allow a forced
confirmation (plus run the seqadj code).This goes for 3)."
Fixes: 827318feb69cb ("netfilter: conntrack: remove helper hook again")
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
11 May, 2020
2 commits
-
'rmmod nf_conntrack' can hang forever, because the netns exit
gets stuck in nf_conntrack_cleanup_net_list():i_see_dead_people:
busy = 0;
list_for_each_entry(net, net_exit_list, exit_list) {
nf_ct_iterate_cleanup(kill_all, net, 0, 0);
if (atomic_read(&net->ct.count) != 0)
busy = 1;
}
if (busy) {
schedule();
goto i_see_dead_people;
}When nf_ct_iterate_cleanup iterates the conntrack table, all nf_conn
structures can be found twice:
once for the original tuple and once for the conntracks reply tuple.get_next_corpse() only calls the iterator when the entry is
in original direction -- the idea was to avoid unneeded invocations
of the iterator callback.When support for clashing entries was added, the assumption that
all nf_conn objects are added twice, once in original, once for reply
tuple no longer holds -- NF_CLASH_BIT entries are only added in
the non-clashing reply direction.Thus, if at least one NF_CLASH entry is in the list then
nf_conntrack_cleanup_net_list() always skips it completely.During normal netns destruction, this causes a hang of several
seconds, until the gc worker removes the entry (NF_CLASH entries
always have a 1 second timeout).But in the rmmod case, the gc worker has already been stopped, so
ct.count never becomes 0.We can fix this in two ways:
1. Add a second test for CLASH_BIT and call iterator for those
entries as well, or:
2. Skip the original tuple direction and use the reply tuple.2) is simpler, so do that.
Fixes: 6a757c07e51f80ac ("netfilter: conntrack: allow insertion of clashing entries")
Reported-by: Chen Yi
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
gcc-10 warns around a suspicious access to an empty struct member:
net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
1522 | memset(&ct->__nfct_init_offset[0], 0,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from net/netfilter/nf_conntrack_core.c:37:
include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
90 | u8 __nfct_init_offset[0];
| ^~~~~~~~~~~~~~~~~~The code is correct but a bit unusual. Rework it slightly in a way that
does not trigger the warning, using an empty struct instead of an empty
array. There are probably more elegant ways to do this, but this is the
smallest change.Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer")
Signed-off-by: Arnd Bergmann
Signed-off-by: Pablo Neira Ayuso
30 Mar, 2020
1 commit
-
Add nf_ct_acct_add function to update the conntrack counter
with packets and bytes.Signed-off-by: wenxu
Signed-off-by: Pablo Neira Ayuso
28 Mar, 2020
2 commits
-
This function allows you to update the conntrack counters.
Signed-off-by: Pablo Neira Ayuso
-
…_conntrack_all_unlock()
Sparse reports warnings at nf_conntrack_all_lock()
and nf_conntrack_all_unlock()warning: context imbalance in nf_conntrack_all_lock()
- wrong count at exit
warning: context imbalance in nf_conntrack_all_unlock()
- unexpected unlockAdd the missing __acquires(&nf_conntrack_locks_all_lock)
Add missing __releases(&nf_conntrack_locks_all_lock)Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
15 Mar, 2020
1 commit
-
TEMPLATE_NULLS_VAL is not used after commit 0838aa7fcfcd
("netfilter: fix netns dependencies with conntrack templates")PFX is not used after commit 8bee4bad03c5b ("netfilter: xt
extensions: use pr_")Signed-off-by: Li RongQing
Signed-off-by: Pablo Neira Ayuso
17 Feb, 2020
1 commit
-
This patch further relaxes the need to drop an skb due to a clash with
an existing conntrack entry.Current clash resolution handles the case where the clash occurs between
two identical entries (distinct nf_conn objects with same tuples), i.e.:Original Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 10.8.8.8:53 10.2.3.4:42 _nfct point to the existing one. The skb can then be
processed normally just as if the clash would not have existed in the
first place.For other clashes, the skb needs to be dropped.
This frequently happens with DNS resolvers that send A and AAAA queries
back-to-back when NAT rules are present that cause packets to get
different DNAT transformations applied, for example:-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
-m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353In this case the A or AAAA query is dropped which incurs a costly
delay during name resolution.This patch also allows this collision type:
Original Reply
existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 10.8.8.8:53 10.2.3.4:42 10.8.8.8:53 (A)
2. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
3. Apply DNAT, reply changed to 10.0.0.6
4. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
5. Apply DNAT, reply changed to 10.0.0.7
6. confirm/commit to conntrack table, no collisions
7. commit clashing entryReply comes in:
10.2.3.4:42 Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
10.2.3.4:42 Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
The conntrack entry is deleted from table, as it has the NAT_CLASH
bit set.In case of a retransmit from ORIGINAL dir, all further packets will get
the DNAT transformation to 10.0.0.6.I tried to come up with other solutions but they all have worse
problems.Alternatives considered were:
1. Confirm ct entries at allocation time, not in postrouting.
a. will cause uneccesarry work when the skb that creates the
conntrack is dropped by ruleset.
b. in case nat is applied, ct entry would need to be moved in
the table, which requires another spinlock pair to be taken.
c. breaks the 'unconfirmed entry is private to cpu' assumption:
we would need to guard all nfct->ext allocation requests with
ct->lock spinlock.2. Make the unconfirmed list a hash table instead of a pcpu list.
Shares drawback c) of the first alternative.3. Document this is expected and force users to rearrange their
ruleset (e.g. by using "-m cluster" instead of "-m statistics").
nft has the 'jhash' expression which can be used instead of 'numgen'.Major drawback: doesn't fix what I consider a bug, not very realistic
and I believe its reasonable to have the existing rulesets to 'just
work'.4. Document this is expected and force users to steer problematic
packets to the same CPU -- this would serialize the "allocate new
conntrack entry/nat table evaluation/perform nat/confirm entry", so
no race can occur. Similar drawback to 3.Another advantage of this patch compared to 1) and 2) is that there are
no changes to the hot path; things are handled in the udp tracker and
the clash resolution path.Cc: rcu@vger.kernel.org
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Jozsef Kadlecsik
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
11 Feb, 2020
3 commits
-
Followup patch will need a helper function with the 'clashing entries
refer to the identical tuple in both directions' resolution logic.This patch will add another resolve_clash helper where loser_ct must
not be added to the dying list because it will be inserted into the
table.Therefore this also moves the stat counters and dying-list insertion
of the losing ct.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
... so it can be re-used from clash resolution in followup patch.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
ctinfo is whats taken from the skb, i.e.
ct = nf_ct_get(skb, &ctinfo).We do not pass 'ct' and instead re-fetch it from the skb.
Just do the same for both netns and ctinfo.Also add a comment on what clash resolution is supposed to do.
While at it, one indent level can be removed.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
01 Feb, 2020
1 commit
-
Convert the uses of kvmalloc_array with __GFP_ZERO to
the equivalent kvcalloc.Signed-off-by: Joe Perches
Signed-off-by: Pablo Neira Ayuso
31 Dec, 2019
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for net-next:
1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner.
2) Document ingress hook in netdevice, also from Lukas.
3) Remove htons() in tunnel metadata port netlink attributes,
from Xin Long.4) Missing erspan netlink attribute validation also from Xin Long.
5) Missing erspan version in tunnel, from Xin Long.
6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN}
Patch from Xin Long.7) Missing nla_nest_cancel() in tunnel netlink dump path,
from Xin Long.8) Remove two exported conntrack symbols with no clients,
from Florian Westphal.9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian.
10) Add nft_meta_pkttype helper for loopback, also from Florian.
11) Add nft_meta_socket uid helper, from Florian Westphal.
12) Add nft_meta_cgroup helper, from Florian.
13) Add nft_meta_ifkind helper, from Florian.
14) Group all interface related meta selector, from Florian.
15) Add nft_prandom_u32() helper, from Florian.
16) Add nft_meta_rtclassid helper, from Florian.
17) Add support for matching on the slave device index,
from Florian.This batch, among other things, contains updates for the netfilter
tunnel netlink interface: This extension is still incomplete and lacking
proper userspace support which is actually my fault, I did not find the
time to go back and finish this. This update is breaking tunnel UAPI in
some aspects to fix it but do it better sooner than never.
====================Signed-off-by: David S. Miller
18 Dec, 2019
1 commit
-
Not used anywhere, remove them.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
01 Dec, 2019
1 commit
-
At this time compiler inlines it, but this code will not be executed
under normal conditions.Also, no inlining allows to use "nf_ct_resolve_clash%return" perf probe.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
27 Oct, 2019
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-nextThe following patchset contains Netfilter/IPVS updates for net-next,
more specifically:* Updates for ipset:
1) Coding style fix for ipset comment extension, from Jeremy Sowden.
2) De-inline many functions in ipset, from Jeremy Sowden.
3) Move ipset function definition from header to source file.
4) Move ip_set_put_flags() to source, export it as a symbol, remove
inline.5) Move range_to_mask() to the source file where this is used.
6) Move ip_set_get_ip_port() to the source file where this is used.
* IPVS selftests and netns improvements:
7) Two patches to speedup ipvs netns dismantle, from Haishuang Yan.
8) Three patches to add selftest script for ipvs, also from
Haishuang Yan.* Conntrack updates and new nf_hook_slow_list() function:
9) Document ct ecache extension, from Florian Westphal.
10) Skip ct extensions from ctnetlink dump, from Florian.
11) Free ct extension immediately, from Florian.
12) Skip access to ecache extension from nf_ct_deliver_cached_events()
this is not correct as reported by Syzbot.13) Add and use nf_hook_slow_list(), from Florian.
* Flowtable infrastructure updates:
14) Move priority to nf_flowtable definition.
15) Dynamic allocation of per-device hooks in flowtables.
16) Allow to include netdevice only once in flowtable definitions.
17) Rise maximum number of devices per flowtable.
* Netfilter hardware offload infrastructure updates:
18) Add nft_flow_block_chain() helper function.
19) Pass callback list to nft_setup_cb_call().
20) Add nft_flow_cls_offload_setup() helper function.
21) Remove rules for the unregistered device via netdevice event.
22) Support for multiple devices in a basechain definition at the
ingress hook.22) Add nft_chain_offload_cmd() helper function.
23) Add nft_flow_block_offload_init() helper function.
24) Rewind in case of failing to bind multiple devices to hook.
25) Typo in IPv6 tproxy module description, from Norman Rasmussen.
====================Signed-off-by: David S. Miller
17 Oct, 2019
1 commit
-
Instead of waiting for rcu grace period just free it directly.
This is safe because conntrack lookup doesn't consider extensions.
Other accesses happen while ct->ext can't be free'd, either because
a ct refcount was taken or because the conntrack hash bucket lock or
the dying list spinlock have been taken.This allows to remove __krealloc in a followup patch, netfilter was the
only user.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
10 Oct, 2019
1 commit
-
As hinted by KCSAN, we need at least one READ_ONCE()
to prevent a compiler optimization.More details on :
https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performancesysbot report :
BUG: KCSAN: data-race in __nf_ct_refresh_acct / __nf_ct_refresh_acctread to 0xffff888123eb4f08 of 4 bytes by interrupt on cpu 0:
__nf_ct_refresh_acct+0xd4/0x1b0 net/netfilter/nf_conntrack_core.c:1796
nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
ipv4_conntrack_in+0x27/0x40 net/netfilter/nf_conntrack_proto.c:178
nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
nf_hook include/linux/netfilter.h:260 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_rcv+0x12f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
napi_skb_finish net/core/dev.c:5671 [inline]
napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
virtnet_receive drivers/net/virtio_net.c:1323 [inline]
virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
napi_poll net/core/dev.c:6352 [inline]
net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
__do_softirq+0x115/0x33f kernel/softirq.c:292write to 0xffff888123eb4f08 of 4 bytes by task 7191 on cpu 1:
__nf_ct_refresh_acct+0xfb/0x1b0 net/netfilter/nf_conntrack_core.c:1797
nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
ipv4_conntrack_local+0xbe/0x130 net/netfilter/nf_conntrack_proto.c:200
nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
nf_hook include/linux/netfilter.h:260 [inline]
__ip_local_out+0x1f7/0x2b0 net/ipv4/ip_output.c:114
ip_local_out+0x31/0x90 net/ipv4/ip_output.c:123
__ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
ip_queue_xmit+0x45/0x60 include/net/ip.h:236
__tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
__tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7191 Comm: syz-fuzzer Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011Fixes: cc16921351d8 ("netfilter: conntrack: avoid same-timeout update")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Cc: Jozsef Kadlecsik
Cc: Florian Westphal
Acked-by: Pablo Neira Ayuso
Signed-off-by: Jakub Kicinski
28 Aug, 2019
1 commit
-
when spinlock is locked/unlocked, its elements will be changed,
so marking it as __read_mostly is not suitable.and remove a duplicate definition of nf_conntrack_locks_all_lock
strange that compiler does not complain.Signed-off-by: Li RongQing
Signed-off-by: Pablo Neira Ayuso
14 Aug, 2019
1 commit
-
Change ct id hash calculation to only use invariants.
Currently the ct id hash calculation is based on some fields that can
change in the lifetime on a conntrack entry in some corner cases. The
current hash uses the whole tuple which contains an hlist pointer which
will change when the conntrack is placed on the dying list resulting in
a ct id change.This patch also removes the reply-side tuple and extension pointer from
the hash calculation so that the ct id will will not change from
initialization until confirmation.Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
Signed-off-by: Dirk Morris
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
16 Jul, 2019
1 commit
-
In 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") the new
generic nf_conntrack was introduced, and it came to supersede the old
ip_conntrack.This change updates (some) of the obsolete comments referring to old
file/function names of the ip_conntrack mechanism, as well as removes a
few self-referencing comments that we shouldn't maintain anymore.I did not update any comments referring to historical actions (e.g,
comments like "this file was derived from ..." were left untouched, even
if the referenced file is no longer here).Signed-off-by: Yonatan Goldschmidt
Signed-off-by: Pablo Neira Ayuso
25 Jun, 2019
1 commit
-
Resolve conflict between d2912cb15bdd ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer
and fe03d4745675 ("Update my email address") which updates Jozsef
Kadlecsik's email.Signed-off-by: Pablo Neira Ayuso
19 Jun, 2019
1 commit
-
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundationthis program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Enrico Weigelt
Reviewed-by: Kate Stewart
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman
17 Jun, 2019
1 commit
-
____nf_conntrack_find() performs checks on the conntrack objects in
this order:1. if (nf_ct_is_expired(ct))
This fetches ct->timeout, in third cache line.
The hnnode that is used to store the list pointers resides in the first
(origin) or second (reply tuple) cache lines.This test rarely passes, but its necessary to reap obsolete entries.
2. if (nf_ct_is_dying(ct))
This fetches ct->status, also in third cache line.
The test is useless, and can be removed:
Consider:
cpu0 cpu1
ct = ____nf_conntrack_find()
atomic_inc_not_zero(ct) -> ok
nf_ct_key_equal -> ok
is_dying -> DYING bit not set, ok
set_bit(ct, DYING);
... unhash ... etc.
return ct
-> returning a ct with dying bit set, despite
having a test for it.This (unlikely) case is fine - refcount prevents ct from getting free'd.
3. if (nf_ct_key_equal(h, tuple, zone, net))
nf_ct_key_equal checks in following order:
1. Tuple equal (first or second cacheline)
2. Zone equal (third cacheline)
3. confirmed bit set (->status, third cacheline)
4. net namespace match (third cacheline).Swapping "timeout" and "cpu" places timeout in the first cacheline.
This has two advantages:1. For a conntrack that won't even match the original tuple,
we will now only fetch the first and maybe the second cacheline
instead of always accessing the 3rd one as well.2. in case of TCP ct->timeout changes frequently because we
reduce/increase it when there are packets outstanding in the network.The first cacheline contains both the reference count and the ct spinlock,
i.e. moving timeout there avoids writes to 3rd cacheline.The restart sequence in __nf_conntrack_find() is removed, if we found a
candidate, but then fail to increment the refcount or discover the tuple
has changed (object recycling), just pretend we did not find an entry.A second lookup won't find anything until another CPU adds a new conntrack
with identical tuple into the hash table, which is very unlikely.We have the confirmation-time checks (when we hold hash lock) that deal
with identical entries and even perform clash resolution in some cases.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
22 Apr, 2019
1 commit
-
setting net.netfilter.nf_conntrack_timestamp=1 breaks xmit with fq
scheduler. skb->tstamp might be "refreshed" using ktime_get_real(),
but fq expects CLOCK_MONOTONIC.This patch removes all places in netfilter that check/set skb->tstamp:
1. To fix the bogus "start" time seen with conntrack timestamping for
outgoing packets, never use skb->tstamp and always use current time.
2. In nfqueue and nflog, only use skb->tstamp for incoming packets,
as determined by current hook (prerouting, input, forward).
3. xt_time has to use system clock as well rather than skb->tstamp.
We could still use skb->tstamp for prerouting/input/foward, but
I see no advantage to make this conditional.Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Cc: Eric Dumazet
Reported-by: Michal Soltys
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso
15 Apr, 2019
1 commit
-
else, we leak the addresses to userspace via ctnetlink events
and dumps.Compute an ID on demand based on the immutable parts of nf_conn struct.
Another advantage compared to using an address is that there is no
immediate re-use of the same ID in case the conntrack entry is freed and
reallocated again immediately.Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID")
Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID")
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso