18 Jan, 2019
1 commit
-
The l4 protocol trackers are invoked via indirect call: l4proto->packet().
With one exception (gre), all l4trackers are builtin, so we can make
.packet optional and use a direct call for most protocols.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
03 Nov, 2018
1 commit
-
Expose these functions to access conntrack protocol tracker netns area,
nfnetlink_cttimeout needs this.Signed-off-by: Pablo Neira Ayuso
21 Sep, 2018
4 commits
-
l4 protocols are demuxed by l3num, l4num pair.
However, almost all l4 trackers are l3 agnostic.
Only exceptions are:
- gre, icmp (ipv4 only)
- icmpv6 (ipv6 only)This commit gets rid of the l3 mapping, l4 trackers can now be looked up
by their IPPROTO_XXX value alone, which gets rid of the additional l3
indirection.For icmp, ipcmp6 and gre, add a check on state->pf and
return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
this seems more fitting than using the generic tracker.Additionally we can kill the 2nd l4proto definitions that were needed
for v4/v6 split -- they are now the same so we can use single l4proto
struct for each protocol, rather than two.The EXPORT_SYMBOLs can be removed as all these object files are
part of nf_conntrack with no external references.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Its unused, next patch will remove l4proto->l3proto number to simplify
l4 protocol demuxer lookup.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
The error() handler gets called before allocating or looking up a
connection tracking entry.We can instead use direct calls from the ->packet() handlers which get
invoked for every packet anyway.Only exceptions are icmp and icmpv6, these two special cases will be
handled in the next patch.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Only two protocols need the ->error() function: icmp and icmpv6.
This is because icmp error mssages might be RELATED to an existing
connection (e.g. PMTUD, port unreachable and the like), and their
->error() handlers do this.The error callback is already optional, so remove it for
udp and call them from ->packet() instead.As the error() callback can call checksum functions that write to
skb->csum*, the const qualifier has to be removed as well.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
20 Sep, 2018
2 commits
-
->new() gets invoked after ->error() and before ->packet() if
a conntrack lookup has found no result for the tuple.We can fold it into ->packet() -- the packet() implementations
can check if the conntrack is confirmed (new) or not
(already in hash).If its unconfirmed, the conntrack isn't in the hash yet so current
skb created a new conntrack entry.Only relevant side effect -- if packet() doesn't return NF_ACCEPT
but -NF_ACCEPT (or drop), while the conntrack was just created,
then the newly allocated conntrack is freed right away, rather than not
created in the first place.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
nf_hook_state contains all the hook meta-information: netns, protocol family,
hook location, and so on.Instead of only passing selected information, pass a pointer to entire
structure.This will allow to merge the error and the packet handlers and remove
the ->new() function in followup patches.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
11 Sep, 2018
1 commit
-
Now that cttimeout support for nft_ct is in place, these should depend
on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
policy if this option is not enabled.[ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[...]
[ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
[...]
[ 71.600188] Call Trace:
[ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]Signed-off-by: Pablo Neira Ayuso
29 Aug, 2018
1 commit
-
tcp, sctp and dccp trackers re-use the userspace ctnetlink states
to index their timeout arrays, which means timeout[0] is never
used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well
so external users can simply read it off timeouts[0] without need to
differentiate dccp/sctp/tcp and udp/icmp/gre/generic.The alternative is to map all array accesses to 'i - 1', but that
is a much more intrusive change.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
16 Jul, 2018
3 commits
-
Not needed, we can have the l4trackers fetch it themselvs.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Handle common protocols (udp, tcp, ..), in the core and only
do the call if needed by the l4proto tracker.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
Handle the common cases (tcp, udp, etc). in the core and only
do the indirect call for the protocols that need it (GRE for instance).Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
09 Jan, 2018
2 commits
-
Nowadays this is just the default template that is used when setting up
the net namespace, so nothing writes to these locations.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
previous patches removed all writes to these structs so we can
now mark them as const.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
08 Jan, 2018
1 commit
-
similar to previous commit, but instead compute this at compile time
and turn nlattr_size into an u16.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
25 Oct, 2017
2 commits
-
not needed/used anymore.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
We currently pass down the l4 protocol to the conntrack ->packet()
function, but the only user of this is the debug info decision.Same information can be derived from struct nf_conn.
As a first step, add and use a new log function for this, similar to
nf_ct_helper_log().Add __cold annotation -- invalid packets should be infrequent so
gcc can consider all call paths that lead to such a function as
unlikely.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
04 Sep, 2017
1 commit
-
tested with allmodconfig build.
Signed-off-by: Florian Westphal
25 Aug, 2017
3 commits
-
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
CONFIG_NF_CONNTRACK_PROCFS is deprecated, no need to use a function
pointer in the trackers for this. Place the printf formatting in
the one place that uses it.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
no need to waste storage for something that is only needed
in one place and can be deduced from protocol number.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
06 Jul, 2017
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter fixes for netThe following patchset contains two Netfilter fixes for your net tree,
they are:1) Fix memleak from netns release path of conntrack protocol trackers,
patch from Liping Zhang.2) Uninitialized flags field in ebt_log, that results in unpredictable
logging format in ebtables, also from Liping.
====================Signed-off-by: David S. Miller
02 Jul, 2017
3 commits
-
This patch is to remove the typedef sctp_inithdr_t, and replace
with struct sctp_inithdr in the places where it's using this
typedef.Signed-off-by: Xin Long
Signed-off-by: David S. Miller -
This patch is to remove the typedef sctp_chunkhdr_t, and replace
with struct sctp_chunkhdr in the places where it's using this
typedef.It is also to fix some indents and use sizeof(variable) instead
of sizeof(type)., especially in sctp_new.Signed-off-by: Xin Long
Signed-off-by: David S. Miller -
This patch is to remove the typedef sctp_sctphdr_t, and replace
with struct sctphdr in the places where it's using this typedef.It is also to fix some indents and use sizeof(variable) instead
of sizeof(type).Signed-off-by: Xin Long
Signed-off-by: David S. Miller
30 Jun, 2017
1 commit
-
After running the following commands for a while, kmemleak reported that
"1879 new suspected memory leaks" happened:
# while : ; do
ip netns add test
ip netns delete test
doneunreferenced object 0xffff88006342fa38 (size 1024):
comm "ip", pid 15477, jiffies 4295982857 (age 957.836s)
hex dump (first 32 bytes):
b8 b0 4d a0 ff ff ff ff c0 34 c3 59 00 88 ff ff ..M......4.Y....
04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] __kmalloc_track_caller+0x150/0x300
[] kmemdup+0x20/0x50
[] dccp_init_net+0x8a/0x160 [nf_conntrack]
[] nf_ct_l4proto_pernet_register_one+0x25/0x90
...
unreferenced object 0xffff88006342da58 (size 1024):
comm "ip", pid 15477, jiffies 4295982857 (age 957.836s)
hex dump (first 32 bytes):
10 b3 4d a0 ff ff ff ff 04 35 c3 59 00 88 ff ff ..M......5.Y....
04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] __kmalloc_track_caller+0x150/0x300
[] kmemdup+0x20/0x50
[] sctp_init_net+0x5d/0x130 [nf_conntrack]
[] nf_ct_l4proto_pernet_register_one+0x25/0x90
...This is because we forgot to implement the get_net_proto for sctp and
dccp, so we won't invoke the nf_ct_unregister_sysctl to free the
ctl_table when do netns cleanup. Also note, we will fail to register
the sysctl for dccp/sctp either due to the lack of get_net_proto.Fixes: c51d39010a1b ("netfilter: conntrack: built-in support for DCCP")
Fixes: a85406afeb3e ("netfilter: conntrack: built-in support for SCTP")
Cc: Davide Caratti
Signed-off-by: Liping Zhang
Acked-by: Davide Caratti
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
24 May, 2017
1 commit
-
sctp_compute_cksum() implementation assumes that at least the SCTP header
is in the linear part of skb: modify conntrack error callback to avoid
false CRC32c mismatch, if the transport header is partially/entirely paged.Fixes: cf6e007eef83 ("netfilter: conntrack: validate SCTP crc32c in PREROUTING")
Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso
01 May, 2017
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-nextThe following patchset contains Netfilter updates for your net-next
tree. A large bunch of code cleanups, simplify the conntrack extension
codebase, get rid of the fake conntrack object, speed up netns by
selective synchronize_net() calls. More specifically, they are:1) Check for ct->status bit instead of using nfct_nat() from IPVS and
Netfilter codebase, patch from Florian Westphal.2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.
3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.
4) Introduce nft_is_base_chain() helper function.
5) Enforce expectation limit from userspace conntrack helper,
from Gao Feng.6) Add nf_ct_remove_expect() helper function, from Gao Feng.
7) NAT mangle helper function return boolean, from Gao Feng.
8) ctnetlink_alloc_expect() should only work for conntrack with
helpers, from Gao Feng.9) Add nfnl_msg_type() helper function to nfnetlink to build the
netlink message type.10) Get rid of unnecessary cast on void, from simran singhal.
11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
also from simran singhal.12) Use list_prev_entry() from nf_tables, from simran signhal.
13) Remove unnecessary & on pointer function in the Netfilter and IPVS
code.14) Remove obsolete comment on set of rules per CPU in ip6_tables,
no longer true. From Arushi Singhal.15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.
16) Remove unnecessary nested rcu_read_lock() in
__nf_nat_decode_session(). Code running from hooks are already
guaranteed to run under RCU read side.17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.
18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
also from Aaron.19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.
20) Don't propagate NF_DROP error to userspace via ctnetlink in
__nf_nat_alloc_null_binding() function, from Gao Feng.21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
from Gao Feng.22) Kill the fake untracked conntrack objects, use ctinfo instead to
annotate a conntrack object is untracked, from Florian Westphal.23) Remove nf_ct_is_untracked(), now obsolete since we have no
conntrack template anymore, from Florian.24) Add event mask support to nft_ct, also from Florian.
25) Move nf_conn_help structure to
include/net/netfilter/nf_conntrack_helper.h.26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
Thus, we don't deal with variable conntrack extensions anymore.
Make sure userspace conntrack helper doesn't go over that size.
Remove variable size ct extension infrastructure now this code
got no more clients. From Florian Westphal.27) Restore offset and length of nf_ct_ext structure to 8 bytes now
that wraparound is not possible any longer, also from Florian.28) Allow to get rid of unassured flows under stress in conntrack,
this applies to DCCP, SCTP and TCP protocols, from Florian.29) Shrink size of nf_conntrack_ecache structure, from Florian.
30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
from Gao Feng.31) Register SYNPROXY hooks on demand, from Florian Westphal.
32) Use pernet hook whenever possible, instead of global hook
registration, from Florian Westphal.33) Pass hook structure to ebt_register_table() to consolidate some
infrastructure code, from Florian Westphal.34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
SYNPROXY code, to make sure device stats are not fooled, patch
from Gao Feng.35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
don't need anymore if we just select a fixed size instead of
expensive runtime time calculation of this. From Florian.36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
from Florian.37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
Florian.38) Attach NAT extension on-demand from masquerade and pptp helper
path, from Florian.39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.
40) Speed up netns by selective calls of synchronize_net(), from
Florian Westphal.41) Silence stack size warning gcc in 32-bit arch in snmp helper,
from Florian.42) Inconditionally call nf_ct_ext_destroy(), even if we have no
extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
Liping Zhang.
====================Signed-off-by: David S. Miller
19 Apr, 2017
1 commit
-
If insertion of a new conntrack fails because the table is full, the kernel
searches the next buckets of the hash slot where the new connection
was supposed to be inserted at for an entry that hasn't seen traffic
in reply direction (non-assured), if it finds one, that entry is
is dropped and the new connection entry is allocated.Allow the conntrack gc worker to also remove *assured* conntracks if
resources are low.Do this by querying the l4 tracker, e.g. tcp connections are now dropped
if they are no longer established (e.g. in finwait).This could be refined further, e.g. by adding 'soft' established timeout
(i.e., a timeout that is only used once we get close to resource
exhaustion).Cc: Jozsef Kadlecsik
Signed-off-by: Florian Westphal
Acked-by: Jozsef Kadlecsik
Signed-off-by: Pablo Neira Ayuso
14 Apr, 2017
1 commit
-
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
02 Feb, 2017
1 commit
-
It is never accessed for reading and the only places that write to it
are the icmp(6) handlers, which also set skb->nfct (and skb->nfctinfo).The conntrack core specifically checks for attached skb->nfct after
->error() invocation and returns early in this case.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
05 Jan, 2017
1 commit
-
implement sctp_error to let nf_conntrack_in validate crc32c on the packet
transport header. Assign skb->ip_summed to CHECKSUM_UNNECESSARY and return
NF_ACCEPT in case of successful validation; otherwise, return -NF_ACCEPT to
let netfilter skip connection tracking, like other protocols do.Besides preventing corrupted packets from matching conntrack entries, this
fixes functionality of REJECT target: it was not generating any ICMP upon
reception of SCTP packets, because it was computing RFC 1624 checksum on
the packet and systematically mismatching crc32c in the SCTP header.Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso
05 Dec, 2016
1 commit
-
CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection
tracking support for SCTP protocol is built-in into nf_conntrack.ko.footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko(builtin)|| sctp | ipv4 | ipv6 | nf_conntrack
---------++--------+--------+--------+--------------
none || 498243 | 828755 | 828676 | 6141434
SCTP || - | 829254 | 829175 | 6547872Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso
18 Nov, 2016
1 commit
-
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data."int" being used as an array index needs to be sign-extended
to 64-bit before being used.void f(long *p, int i)
{
g(p[i]);
}roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call gMOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]However, overall balance is in negative direction:
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
10 Nov, 2016
1 commit
-
modify registration and deregistration of layer-4 protocol trackers to
facilitate inclusion of new elements into the current list of builtin
protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP,
UDPlite) layer-4 protocol trackers usually register/deregister themselves
using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...).
This sequence is interrupted and rolled back in case of error; in order to
simplify addition of builtin protocols, the input of the above functions
has been modified to allow registering/unregistering multiple protocols.Signed-off-by: Davide Caratti
Signed-off-by: Pablo Neira Ayuso
13 Aug, 2016
1 commit
-
This backward compatibility has been around for more than ten years,
since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
the conntrack utility got adopted by many people in the user community
according to what I observed on the netfilter user mailing list.So let's get rid of this.
Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
not need to be exported as symbol anymore.Signed-off-by: Pablo Neira Ayuso
12 Aug, 2016
1 commit
-
We only need first 4 bytes instead of 8 bytes to get the ports of
tcp/udp/dccp/sctp/udplite in their pkt_to_tuple function.Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso
20 Apr, 2016
1 commit
-
read access doesn't need any lock here.
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
19 Sep, 2015
1 commit
-
As gre does not have the srckey in the packet gre_pkt_to_tuple
needs to perform a lookup in it's per network namespace tables.Pass in the proper network namespace to all pkt_to_tuple
implementations to ensure gre (and any similar protocols) can get this
right.Signed-off-by: "Eric W. Biederman"
Signed-off-by: Pablo Neira Ayuso