Eric Lee / smarc-fsl-linux-kernel

13 Aug, 2016

1 commit

adf051684 netfilter: remove ip_conntrack* sysctl compat code ... Browse Code »

This backward compatibility has been around for more than ten years,
since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
the conntrack utility got adopted by many people in the user community
according to what I observed on the netfilter user mailing list.

So let's get rid of this.

Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
not need to be exported as symbol anymore.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-08-13 19:27:13 +0800

09 May, 2016

2 commits

0c5366b3a netfilter: conntrack: use single slab cache ... Browse Code »

An earlier patch changed lookup side to also net_eq() namespaces after
obtaining a reference on the conntrack, so a single kmemcache can be used.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-09 22:45:50 +0800
a76ae1c85 netfilter: conntrack: use a single nat bysource table for all namespaces ... Browse Code »

We already include netns address in the hash, so we only need to use
net_eq in find_appropriate_src and can then put all entries into
same table.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-09 22:45:49 +0800

06 May, 2016

1 commit

0a93aaedc netfilter: conntrack: use a single expectation table for all namespaces ... Browse Code »

We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the expectation table.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-06 17:50:01 +0800

05 May, 2016

1 commit

56d52d489 netfilter: conntrack: use a single hashtable for all namespaces ... Browse Code »

We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the table.

Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
64bit system.

NAT bysrc and expectation hash is still per namespace, those will
changed too soon.

Future patch will also make conntrack object slab cache global again.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:47 +0800

25 Apr, 2016

1 commit

a3efd8120 netfilter: conntrack: move generation seqcnt out of netns_ct ... Browse Code »

We only allow rehash in init namespace, so we only use
init_ns.generation. And even if we would allow it, it makes no sense
as the conntrack locks are global; any ongoing rehash prevents insert/
delete.

So make this private to nf_conntrack_core instead.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-25 20:52:11 +0800

20 Jul, 2015

1 commit

0838aa7fc netfilter: fix netns dependencies with conntrack templates ... Browse Code »

Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

ip netns add foo
ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso
Tested-by: Daniel Borkmann

Pablo Neira Ayuso
2015-07-20 20:58:19 +0800

26 Jun, 2014

1 commit

9500507c6 netfilter: conntrack: remove timer from ecache extension ... Browse Code »

This brings the (per-conntrack) ecache extension back to 24 bytes in size
(was 152 byte on x86_64 with lockdep on).

When event delivery fails, re-delivery is attempted via work queue.

Redelivery is attempted at least every 0.1 seconds, but can happen
more frequently if userspace is not congested.

The nf_ct_release_dying_list() function is removed.
With this patch, ownership of the to-be-redelivered conntracks
(on-dying-list-with-DYING-bit not yet set) is with the work queue,
which will release the references once event is out.

Joint work with Pablo Neira Ayuso.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2014-06-26 01:15:38 +0800

07 Mar, 2014

2 commits

93bb0ceb7 netfilter: conntrack: remove central spinlock nf_conntrack_lock ... Browse Code »

nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).

Perf locking congestion is clear on base kernel:

- 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh
- _raw_spin_lock_bh
+ 25.33% init_conntrack
+ 24.86% nf_ct_delete_from_lists
+ 24.62% __nf_conntrack_confirm
+ 24.38% destroy_conntrack
+ 0.70% tcp_packet
+ 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup
+ 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free
+ 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer
+ 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete
+ 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table

This patch change conntrack locking and provides a huge performance
improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):

Base kernel: 810.405 new conntrack/sec
After patch: 2.233.876 new conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
# iptables -A INPUT -m state --state INVALID -j DROP
# sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y

The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.

Signed-off-by: Eric Dumazet
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Jesper Dangaard Brouer
2014-03-07 18:41:13 +0800
b7779d06f netfilter: conntrack: spinlock per cpu to protect special lists. ... Browse Code »

One spinlock per cpu to protect dying/unconfirmed/template special lists.
(These lists are now per cpu, a bit like the untracked ct)
Add a @cpu field to nf_conn, to make sure we hold the appropriate
spinlock at removal time.

Signed-off-by: Eric Dumazet
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Jesper Dangaard Brouer
2014-03-07 18:40:38 +0800

13 Dec, 2013

1 commit

8cf4d6a22 net: reorder struct netns_ct for better cache-line usage ... Browse Code »

Reorder struct netns_ct so that atomic_t "count" changes don't
slowdown users of read mostly fields.

This is based on Eric Dumazet's proposed patch:
"netfilter: conntrack: remove the central spinlock"
http://thread.gmane.org/gmane.linux.network/268758/focus=47306

The tricky part of cache-aligning this structure, that it is getting
inlined in struct net (include/net/net_namespace.h), thus changes to
other netns_xxx structures affects our alignment.

Eric's original patch contained an ambiguity on 32-bit regarding
alignment in struct net. This patch also takes 32-bit into account,
and in case of changed (struct net) alignment sysctl_xxx entries have
been ordered according to how often they are accessed.

Signed-off-by: Jesper Dangaard Brouer
Reviewed-by: Jiri Benc
Signed-off-by: Pablo Neira Ayuso

Jesper Dangaard Brouer
2013-12-13 19:55:55 +0800

18 Jan, 2013

1 commit

c539f0171 netfilter: add connlabel conntrack extension ... Browse Code »

similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.

Up to 128 labels are supported. Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.

Mapping of bit-identifier to label name is done in userspace.

The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2013-01-18 07:28:15 +0800

17 Dec, 2012

1 commit

252b3e8c1 netfilter: xt_CT: fix crash while destroy ct templates ... Browse Code »

In (d871bef netfilter: ctnetlink: dump entries from the dying and
unconfirmed lists), we assume that all conntrack objects are
inserted in any of the existing lists. However, template conntrack
objects were not. This results in hitting BUG_ON in the
destroy_conntrack path while removing a rule that uses the CT target.

This patch fixes the situation by adding the template lists, which
is where template conntrack objects reside now.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2012-12-17 06:44:12 +0800

30 Aug, 2012

1 commit

c7232c997 netfilter: add protocol independent NAT core ... Browse Code »
86

Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:14 +0800

07 Jun, 2012

7 commits

7080ba095 netfilter: nf_ct_icmp: add namespace support ... Browse Code »

This patch adds namespace support for ICMPv6 protocol tracker.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:40 +0800
4b626b9c5 netfilter: nf_ct_icmp: add namespace support ... Browse Code »

This patch adds namespace support for ICMP protocol tracker.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:40 +0800
0ce490ad4 netfilter: nf_ct_udp: add namespace support ... Browse Code »

This patch adds namespace support for UDP protocol tracker.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:40 +0800
d2ba1fde4 netfilter: nf_ct_tcp: add namespace support ... Browse Code »

This patch adds namespace support for TCP protocol tracker.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:39 +0800
15f585bd7 netfilter: nf_ct_generic: add namespace support ... Browse Code »

This patch adds namespace support for the generic layer 4 protocol
tracker.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:39 +0800
524a53e5a netfilter: nf_conntrack: prepare namespace support for l3 protocol trackers ... Browse Code »

This patch prepares the namespace support for layer 3 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_l3proto_[un]register_sysctl.
* nf_conntrack_l3proto_[un]register.

We add a new nf_ct_l3proto_net is used to get the pernet data of l3proto.

This adds rhe new struct nf_ip_net that is used to store the sysctl header
and l3proto_ipv4,l4proto_tcp(6),l4proto_udp(6),l4proto_icmp(v6) because the
protos such tcp and tcp6 use the same data,so making nf_ip_net as a field
of netns_ct is the easiest way to manager it.

This patch also adds init_net to struct nf_conntrack_l3proto to initial
the layer 3 protocol pernet data.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:39 +0800
2c352f444 netfilter: nf_conntrack: prepare namespace support for l4 protocol trackers ... Browse Code »
43

This patch prepares the namespace support for layer 4 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_[un]register_sysctl
* nf_conntrack_l4proto_[un]register

to include the namespace parameter. We still use init_net in this patch
to prepare the ground for follow-up patches for each layer 4 protocol
tracker.

We add a new net_id field to struct nf_conntrack_l4proto that is used
to store the pernet_operations id for each layer 4 protocol tracker.

Note that AF_INET6's protocols do not need to do sysctl compat. Thus,
we only register compat sysctl when l4proto.l3proto != AF_INET6.

Acked-by: Eric W. Biederman
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso

Gao feng
2012-06-07 20:58:39 +0800

09 May, 2012

1 commit

a90068926 netfilter: nf_ct_helper: allow to disable automatic helper assignment ... Browse Code »

This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond
Signed-off-by: Pablo Neira Ayuso

Eric Leblond
2012-05-09 01:35:18 +0800

22 Nov, 2011

1 commit

70e9942f1 netfilter: nf_conntrack: make event callback registration per-netns ... Browse Code »
43

This patch fixes an oops that can be triggered following this recipe:

0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
1) container is started.
2) connect to it via lxc-console.
3) generate some traffic with the container to create some conntrack
entries in its table.
4) stop the container: you hit one oops because the conntrack table
cleanup tries to report the destroy event to user-space but the
per-netns nfnetlink socket has already gone (as the nfnetlink
socket is per-netns but event callback registration is global).

To fix this situation, we make the ctnl_notifier per-netns so the
callback is registered/unregistered if the container is
created/destroyed.

Alex Bligh and Alexey Dobriyan originally proposed one small patch to
check if the nfnetlink socket is gone in nfnetlink_has_listeners,
but this is a very visited path for events, thus, it may reduce
performance and it looks a bit hackish to check for the nfnetlink
socket only to workaround this situation. As a result, I decided
to follow the bigger path choice, which seems to look nicer to me.

Cc: Alexey Dobriyan
Reported-by: Alex Bligh
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2011-11-22 07:34:47 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

19 Jan, 2011

1 commit

a992ca2a0 netfilter: nf_conntrack_tstamp: add flow-based timestamp extension ... Browse Code »

This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp

This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.

This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Patrick McHardy

Pablo Neira Ayuso
2011-01-19 23:00:07 +0800

14 Jan, 2011

1 commit

d862a6622 netfilter: nf_conntrack: use is_vmalloc_addr() ... Browse Code »

Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of
the vmalloc flags to indicate that a hash table has been allocated
using vmalloc().

Signed-off-by: Patrick McHardy

Patrick McHardy
2011-01-14 22:45:56 +0800

17 Feb, 2010

1 commit

7d720c3e4 percpu: add __percpu sparse annotations to net ... Browse Code »

Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).

Signed-off-by: Tejun Heo
Acked-by: David S. Miller
Cc: Patrick McHardy
Cc: Arnaldo Carvalho de Melo
Cc: Vlad Yasevich
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Tejun Heo
2010-02-17 15:05:38 +0800

09 Feb, 2010

2 commits

d696c7bda netfilter: nf_conntrack: fix hash resizing with namespaces ... Browse Code »

As noticed by Jon Masters , the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.

Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.

Cc: stable@kernel.org
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2010-02-09 03:18:07 +0800
5b3501faa netfilter: nf_conntrack: per netns nf_conntrack_cachep ... Browse Code »

nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.

If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.

We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).

If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.

Signed-off-by: Eric Dumazet
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy

Eric Dumazet
2010-02-09 03:16:56 +0800

13 Jun, 2009

2 commits

dd7669a92 netfilter: conntrack: optional reliable conntrack event delivery ... Browse Code »

This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.

The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.

During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.

A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.

For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.

This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Patrick McHardy

Pablo Neira Ayuso
2009-06-13 18:30:52 +0800
a0891aa6a netfilter: conntrack: move event caching to conntrack extension infrastructure ... Browse Code »

This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.

The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.

BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Patrick McHardy

Pablo Neira Ayuso
2009-06-13 18:26:29 +0800

26 Mar, 2009

1 commit

ea781f197 netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu() ... Browse Code »

Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.

This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.

Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.

- It allows fast reuse of just freed elements, permitting better use of
CPU cache.

- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.

This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.

Signed-off-by: Eric Dumazet
Signed-off-by: Patrick McHardy

Eric Dumazet
2009-03-26 04:05:46 +0800

08 Oct, 2008

8 commits

d716a4dfb netfilter: netns nf_conntrack: per-netns conntrack accounting ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:09 +0800
c2a2c7e0c netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_log_invalid sysctl ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:08 +0800
c04d05529 netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_checksum sysctl ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:08 +0800
802507071 netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_count sysctl ... Browse Code »

Note, sysctl table is always duplicated, this is simpler and less
special-cased.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:08 +0800
0d55af879 netfilter: netns nf_conntrack: per-netns statistics ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:07 +0800
6058fa6bb netfilter: netns nf_conntrack: per-netns event cache ... Browse Code »

Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.

Not so sure anymore.

[Patrick: pointers will be different, flushing can only be done while
inactive though and thus it needs to be per netns]

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:07 +0800
63c9a2626 netfilter: netns nf_conntrack: per-netns unconfirmed list ... Browse Code »

What is confirmed connection in one netns can very well be unconfirmed
in another one.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:04 +0800
9b03f38d0 netfilter: netns nf_conntrack: per-netns expectations ... Browse Code »

Make per-netns a) expectation hash and b) expectations count.

Expectations always belongs to netns to which it's master conntrack belong.
This is natural and doesn't bloat expectation.

Proc files and leaf users are stubbed to init_net, this is temporary.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy

Alexey Dobriyan
2008-10-08 17:35:03 +0800