18 Mar, 2020
1 commit
-
commit dc15af8e9dbd039ebb06336597d2c491ef46ab74 upstream.
If .next function does not change position index,
following .show function will repeat output related
to current position index.Cc: stable@vger.kernel.org
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman
13 Sep, 2019
1 commit
-
Move some `struct nf_conntrack` code from linux/skbuff.h to
linux/nf_conntrack_common.h. Together with a couple of helpers for
getting and setting skb->_nfct, it allows us to remove
CONFIG_NF_CONNTRACK checks from net/netfilter/nf_conntrack.h.Signed-off-by: Jeremy Sowden
Signed-off-by: Pablo Neira Ayuso
03 Sep, 2019
1 commit
-
r8152 conflicts are the NAPI fixes in 'net' overlapping with
some tasklet stuff in net-nextSigned-off-by: David S. Miller
27 Aug, 2019
1 commit
-
When I merged the extension sysctl tables with the main one I forgot to
reset them on netns creation. They currently read/write init_net settings.Fixes: d912dec12428 ("netfilter: conntrack: merge acct and helper sysctl table with main one")
Fixes: cb2833ed0044 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one")
Reported-by: Shmulik Ladkani
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
04 Aug, 2019
1 commit
-
Use shared sysctl variables for zero and one constants, as in commit
eec4844fae7c ("proc/sysctl: add shared variables for range check")Fixes: 8f14c99c7eda ("netfilter: conntrack: limit sysctl setting for boolean options")
Signed-off-by: Matteo Croce
Signed-off-by: Pablo Neira Ayuso
30 Apr, 2019
1 commit
-
We use the zero and one to limit the boolean options setting.
After this patch we only set 0 or 1 to boolean options for nf
conntrack sysctl.Signed-off-by: Tonghao Zhang
Signed-off-by: Pablo Neira Ayuso
28 Jan, 2019
1 commit
-
When nf_ct_netns_get() fails, it should clean up itself,
its caller doesn't need to call nf_conntrack_fini_net().nf_conntrack_init_net() is called after registering sysctl
and proc, so its cleanup function should be called before
unregistering sysctl and proc.Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling")
Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com
Signed-off-by: Cong Wang
Signed-off-by: Pablo Neira Ayuso
18 Jan, 2019
3 commits
-
The connection tracking hooks can be optionally registered per netns
when conntrack is specifically invoked from the ruleset since
0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed
by ruleset"). Then, since 4d3a57f23dec ("netfilter: conntrack: do not
enable connection tracking unless needed"), the default behaviour is
changed to always register them on demand.This patch provides a toggle that allows users to always register them.
Without this toggle, in order to use conntrack for statistics
collection, you need a dummy rule that refers to conntrack, eg.iptables -I INPUT -m state --state NEW
This patch allows users to restore the original behaviour via modparam,
ie. always register connection tracking, eg.modprobe nf_conntrack enable_hooks=1
Hence, no dummy rule is required.
Reported-by: Laura Garcia
Signed-off-by: Pablo Neira Ayuso -
Its now same as __nf_ct_l4proto_find(), so rename that to
nf_ct_l4proto_find and use it everywhere.It never returns NULL and doesn't need locks or reference counts.
Before this series:
302824 net/netfilter/nf_conntrack.ko
21504 net/netfilter/nf_conntrack_proto_gre.kotext data bss dec hex filename
6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko
108356 20613 236 129205 1f8b5 nf_conntrack.koAfter:
294864 net/netfilter/nf_conntrack.ko
text data bss dec hex filename
106979 19557 240 126776 1ef38 nf_conntrack.koso, even with builtin gre, total size got reduced.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Due to historical reasons, all l4 trackers register their own
sysctls.This leads to copy&pasted boilerplate code, that does exactly same
thing, just with different data structure.Place all of this in a single file.
This allows to remove the various ctl_table pointers from the ct_netns
structure and reduces overall code size.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
21 Dec, 2018
4 commits
-
Similar to previous change, this time for eache and timestamp.
Unlike helper and acct, these can be disabled at build time, so they
need ifdef guards.Next patch will remove a few (now obsolete) functions.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Needless copy&paste, just handle all in one. Next patch will handle
acct and timestamp, which have similar functions.Intentionally leaves cruft behind, will be cleaned up in a followup
patch.The obsolete sysctl pointers in netns_ct struct are left in place and
removed in a single change, as changes to netns trigger rebuild of
almost all files.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Its a bit hard to see what table[3] really lines up with, so add
human-readable mnemonics and use them for initialisation.This makes it easier to see e.g. which sysctls are not exported to
unprivileged userns.objdiff shows no changes.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Only one caller, just place it where its needed.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
21 Sep, 2018
1 commit
-
l4 protocols are demuxed by l3num, l4num pair.
However, almost all l4 trackers are l3 agnostic.
Only exceptions are:
- gre, icmp (ipv4 only)
- icmpv6 (ipv6 only)This commit gets rid of the l3 mapping, l4 trackers can now be looked up
by their IPPROTO_XXX value alone, which gets rid of the additional l3
indirection.For icmp, ipcmp6 and gre, add a check on state->pf and
return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
this seems more fitting than using the generic tracker.Additionally we can kill the 2nd l4proto definitions that were needed
for v4/v6 split -- they are now the same so we can use single l4proto
struct for each protocol, rather than two.The EXPORT_SYMBOLs can be removed as all these object files are
part of nf_conntrack with no external references.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
17 Sep, 2018
1 commit
-
as of a0ae2562c6c4b27 ("netfilter: conntrack: remove l3proto
abstraction") there are no users anymore.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
17 Jul, 2018
1 commit
-
This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
abstraction.This gets rid of all l3proto indirect calls and the need to do
a lookup on the function to call for l3 demux.It increases module size by only a small amount (12kbyte), so this reduces
size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
or nf_conntrack_ipv6 module.before:
text data bss dec hex filename
7357 1088 0 8445 20fd nf_conntrack_ipv4.ko
7405 1084 4 8493 212d nf_conntrack_ipv6.ko
72614 13689 236 86539 1520b nf_conntrack.ko
19K nf_conntrack_ipv4.ko
19K nf_conntrack_ipv6.ko
179K nf_conntrack.koafter:
text data bss dec hex filename
79277 13937 236 93450 16d0a nf_conntrack.ko
191K nf_conntrack.koSigned-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
16 Jul, 2018
1 commit
-
handle everything from ctnetlink directly.
After all these years we still only support ipv4 and ipv6, so it
seems reasonable to remove l3 protocol tracker support and instead
handle ipv4/ipv6 from a common, always builtin inet tracker.Step 1: Get rid of all the l3proto->func() calls.
Start with ctnetlink, then move on to packet-path ones.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
16 May, 2018
1 commit
-
Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release. All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.Signed-off-by: Christoph Hellwig
28 Mar, 2018
1 commit
-
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller
27 Mar, 2018
1 commit
-
Prefer the direct use of octal for permissions.
Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.Miscellanea:
o Whitespace neatening around these conversions.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
05 Mar, 2018
1 commit
-
These pernet_operations register and unregister sysctl and /proc
entries. Exit batch method also waits till all per-net conntracks
are dead. Thus, they are safe to be marked as async.Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller
19 Jan, 2018
1 commit
-
/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
+ if (de->proc_fops) {
+ if (S_ISREG(inode->i_mode))
+ inode->i_fop = &proc_reg_file_ops;
+ else
+ inode->i_fop = de->proc_fops;
+ }VFS stopped pinning module at this point.
# ipvs
Acked-by: Julian Anastasov
Signed-off-by: Alexey Dobriyan
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso
09 Jan, 2018
1 commit
-
This new bit tells us that the conntrack entry is owned by the flow
table offload infrastructure.# cat /proc/net/nf_conntrack
ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2Note the [OFFLOAD] tag in the listing.
The timer of such conntrack entries look like stopped from userspace.
In practise, to make sure the conntrack entry does not go away, the
conntrack timer is periodically set to an arbitrary large value that
gets refreshed on every iteration from the garbage collector, so it
never expires- and they display no internal state in the case of TCP
flows. This allows us to save a bitcheck from the packet path via
nf_ct_is_expired().Conntrack entries that have been offloaded to the flow table
infrastructure cannot be deleted/flushed via ctnetlink. The flow table
infrastructure is also responsible for releasing this conntrack entry.Signed-off-by: Pablo Neira Ayuso
04 Sep, 2017
1 commit
-
This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
Signed-off-by: Varsha Rao
25 Aug, 2017
3 commits
-
CONFIG_NF_CONNTRACK_PROCFS is deprecated, no need to use a function
pointer in the trackers for this. Place the printf formatting in
the one place that uses it.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
no need to waste storage for something that is only needed
in one place and can be deduced from protocol number.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
no need to waste storage for something that is only needed
in one place and can be deduced from protocol number.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
01 Aug, 2017
1 commit
-
Discussion during NFWS 2017 in Faro has shown that the current
conntrack behaviour is unreasonable.Even if conntrack module is loaded on behalf of a single net namespace,
its turned on for all namespaces, which is expensive. Commit
481fa373476 ("netfilter: conntrack: add nf_conntrack_default_on sysctl")
attempted to provide an alternative to the 'default on' behaviour by
adding a sysctl to change it.However, as Eric points out, the sysctl only becomes available
once the module is loaded, and then its too late.So we either have to move the sysctl to the core, or, alternatively,
change conntrack to become active only once the rule set requires this.This does the latter, conntrack is only enabled when a rule needs it.
Reported-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
07 Apr, 2017
1 commit
-
For string without format specifiers, use seq_puts(). For
seq_printf("\n"), use seq_putc('\n').Signed-off-by: simran singhal
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso
02 Feb, 2017
1 commit
-
After this change conntrack operations (lookup, creation, matching from
ruleset) only access one instead of two sk_buff cache lines.This works for normal conntracks because those are allocated from a slab
that guarantees hw cacheline or 8byte alignment (whatever is larger)
so the 3 bits needed for ctinfo won't overlap with nf_conn addresses.Template allocation now does manual address alignment (see previous change)
on arches that don't have sufficent kmalloc min alignment.Some spots intentionally use skb->_nfct instead of skb_nfct() helpers,
this is to avoid undoing the skb_nfct() use when we remove untracked
conntrack object in the future.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
05 Dec, 2016
1 commit
-
This switch (default on) can be used to disable automatic registration
of connection tracking functionality in newly created network
namespaces.This means that when net namespace goes down (or the tracker protocol
module is unloaded) we *might* have to unregister the hooks.We can either add another per-netns variable that tells if
the hooks got registered by default, or, alternatively, just call
the protocol _put() function and have the callee deal with a possible
'extra' put() operation that doesn't pair with a get() one.This uses the latter approach, i.e. a put() without a get has no effect.
Conntrack is still enabled automatically regardless of the new sysctl
setting if the new net namespace requires connection tracking, e.g. when
NAT rules are created.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
25 Sep, 2016
1 commit
-
Fabian reports a possible conntrack memory leak (could not reproduce so
far), however, one minor issue can be easily resolved:> cat /proc/net/nf_conntrack | wc -l = 5
> 4 minutes required to clean up the table.We should not report those timed-out entries to the user in first place.
And instead of just skipping those timed-out entries while iterating over
the table we can also zap them (we already do this during ctnetlink
walks, but I forgot about the /proc interface).Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")
Reported-by: Fabian Frederick
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
13 Sep, 2016
1 commit
-
These counters sit in hot path and do show up in perf, this is especially
true for 'found' and 'searched' which get incremented for every packet
processed.Information like
searched=212030105
new=623431
found=333613
delete=623327does not seem too helpful nowadays:
- on busy systems found and searched will overflow every few hours
(these are 32bit integers), other more busy ones every few days.- for debugging there are better methods, such as iptables' trace target,
the conntrack log sysctls. Nowadays we also have perf tool.This removes packet path stat counters except those that
are expected to be 0 (or close to 0) on a normal system, e.g.
'insert_failed' (race happened) or 'invalid' (proto tracker rejects).The insert stat is retained for the ctnetlink case.
The found stat is retained for the tuple-is-taken check when NAT has to
determine if it needs to pick a different source address.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
07 Sep, 2016
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for your net-next
tree. Most relevant updates are the removal of per-conntrack timers to
use a workqueue/garbage collection approach instead from Florian
Westphal, the hash and numgen expression for nf_tables from Laura
Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
removal of ip_conntrack sysctl and many other incremental updates on our
Netfilter codebase.More specifically, they are:
1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
transport area in dccp, sctp, tcp, udp and udplite protocol
conntrackers, from Gao Feng.2) Missing whitespace on error message in physdev match, from Hangbin Liu.
3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.
4) Add nf_ct_expires() helper function and use it, from Florian Westphal.
5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
from Florian.6) Rename nf_tables set implementation to nft_set_{name}.c
7) Introduce the hash expression to allow arbitrary hashing of selector
concatenations, from Laura Garcia Liebana.8) Remove ip_conntrack sysctl backward compatibility code, this code has
been around for long time already, and we have two interfaces to do
this already: nf_conntrack sysctl and ctnetlink.9) Use nf_conntrack_get_ht() helper function whenever possible, instead
of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.10) Add quota expression for nf_tables.
11) Add number generator expression for nf_tables, this supports
incremental and random generators that can be combined with maps,
very useful for load balancing purpose, again from Laura Garcia Liebana.12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.
13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
configuration, this is used by a follow up patch to perform better chain
update validation.14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
nft_set_hash implementation to honor the NLM_F_EXCL flag.15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
patch from Florian Westphal.16) Don't use the DYING bit to know if the conntrack event has been already
delivered, instead a state variable to track event re-delivery
states, also from Florian.17) Remove the per-conntrack timer, use the workqueue approach that was
discussed during the NFWS, from Florian Westphal.18) Use the netlink conntrack table dump path to kill stale entries,
again from Florian.19) Add a garbage collector to get rid of stale conntracks, from
Florian.20) Reschedule garbage collector if eviction rate is high.
21) Get rid of the __nf_ct_kill_acct() helper.
22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.
23) Make nf_log_set() interface assertive on unsupported families.
====================Signed-off-by: David S. Miller
17 Aug, 2016
1 commit
-
We should skip the conntracks that belong to a different namespace,
otherwise other unrelated netns's conntrack entries will be dumped via
/proc/net/nf_conntrack.Fixes: 56d52d4892d0 ("netfilter: conntrack: use a single hashtable for all namespaces")
Signed-off-by: Liping Zhang
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
12 Aug, 2016
1 commit
-
... so we don't need to touch all of these places when we get rid of the
timer in nf_conn.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
11 Jul, 2016
1 commit
-
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
happen, because reader can observe a newly allocated hash but the old size
(or vice versa). So oops will happen like follows:BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
IP: [] seq_print_acct+0x11/0x50 [nf_conntrack]
Call Trace:
[] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
[] seq_read+0x2cc/0x390
[] proc_reg_read+0x42/0x70
[] __vfs_read+0x37/0x130
[] ? security_file_permission+0xa0/0xc0
[] vfs_read+0x95/0x140
[] SyS_read+0x55/0xc0
[] entry_SYSCALL_64_fastpath+0x1a/0xa4It is very easy to reproduce this kernel crash.
1. open one shell and input the following cmds:
while : ; do
echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize
done
2. open more shells and input the following cmds:
while : ; do
cat /proc/net/nf_conntrack
done
3. just wait a monent, oops will happen soon.The solution in this patch is based on Florian's Commit 5e3c61f98175
("netfilter: conntrack: fix lookup race during hash resize"). And
add a wrapper function nf_conntrack_get_ht to get hash and hsize
suggested by Florian Westphal.Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso
07 Jul, 2016
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for net-next,
they are:1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================Signed-off-by: David S. Miller
24 Jun, 2016
1 commit
-
No need to restrict this to module parameter.
We export a copy of the real hash size -- when user alters the value we
allocate the new table, copy entries etc before we update the real size
to the requested one.This is also needed because the real size is used by concurrent readers
and cannot be changed without synchronizing the conntrack generation
seqcnt.We only allow changing this value from the initial net namespace.
Tested using http-client-benchmark vs. httpterm with concurrent
while true;do
echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
doneSigned-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso