Eric Lee / smarc-fsl-linux-kernel

11 Jul, 2016

1 commit

64b87639c netfilter: conntrack: fix race between nf_conntrack proc read and hash resize ... Browse Code »

When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
happen, because reader can observe a newly allocated hash but the old size
(or vice versa). So oops will happen like follows：

BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
IP: [] seq_print_acct+0x11/0x50 [nf_conntrack]
Call Trace:
[] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
[] seq_read+0x2cc/0x390
[] proc_reg_read+0x42/0x70
[] __vfs_read+0x37/0x130
[] ? security_file_permission+0xa0/0xc0
[] vfs_read+0x95/0x140
[] SyS_read+0x55/0xc0
[] entry_SYSCALL_64_fastpath+0x1a/0xa4

It is very easy to reproduce this kernel crash.
1. open one shell and input the following cmds:
while : ; do
echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize
done
2. open more shells and input the following cmds:
while : ; do
cat /proc/net/nf_conntrack
done
3. just wait a monent, oops will happen soon.

The solution in this patch is based on Florian's Commit 5e3c61f98175
("netfilter: conntrack: fix lookup race during hash resize"). And
add a wrapper function nf_conntrack_get_ht to get hash and hsize
suggested by Florian Westphal.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-07-11 17:38:57 +0800

07 Jul, 2016

1 commit

ae3e4562e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next,
they are:

1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.

2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.

3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():

4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.

5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.

6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.

7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.

9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.

10) Owner match now works from any namespace, from Eric Bierdeman.

11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.

12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.

13) Introduce generic macros for nf_tables object generation masks.

14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.

15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.

16) Support for deletion of just added elements in the hash set type.

17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.

18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.

19) Support for matching inverted set lookups, from Arturo Borrero.

20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.

21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.

22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.

23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.

24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-07-07 00:15:15 +0800

24 Jun, 2016

1 commit

3183ab899 netfilter: conntrack: allow increasing bucket size via sysctl too ... Browse Code »

No need to restrict this to module parameter.

We export a copy of the real hash size -- when user alters the value we
allocate the new table, copy entries etc before we update the real size
to the requested one.

This is also needed because the real size is used by concurrent readers
and cannot be changed without synchronizing the conntrack generation
seqcnt.

We only allow changing this value from the initial net namespace.

Tested using http-client-benchmark vs. httpterm with concurrent

while true;do
echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
done

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-06-24 17:03:28 +0800

23 Jun, 2016

2 commits

6c8dee984 netfilter: move zone info into struct nf_conn ... Browse Code »

Curently we store zone information as a conntrack extension.
This has one drawback: for every lookup we need to fetch the zone data
from the extension area.

This change place the zone data directly into the main conntrack object
structure and then removes the zone conntrack extension.

The zone data is just 4 bytes, it fits into a padding hole before
the tuplehash info, so we do not even increase the nf_conn structure size.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-06-23 19:33:12 +0800
5a75cdeba netfilter: conntrack: align nf_conn on cacheline boundary ... Browse Code »

increases struct size by 32 bytes (288 -> 320), but it is the right thing,
else any attempt to (re-)arrange nf_conn members by cacheline won't work.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-06-23 19:31:54 +0800

15 Jun, 2016

1 commit

775711497 netfilter: conntrack: destroy kmemcache on module removal ... Browse Code »

I forgot to move the kmem_cache_destroy into the exit path.

Fixes: 0c5366b3a8c7 ("netfilter: conntrack: use single slab cache)
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-06-15 18:17:22 +0800

16 May, 2016

1 commit

909b27f70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next'
because we no longer have a per-netns conntrack hash.

The ip_gre.c conflict as well as the iwlwifi ones were cases of
overlapping changes.

Conflicts:
drivers/net/wireless/intel/iwlwifi/mvm/tx.c
net/ipv4/ip_gre.c
net/netfilter/nf_conntrack_core.c

Signed-off-by: David S. Miller

David S. Miller
2016-05-16 01:32:48 +0800

15 May, 2016

1 commit

31b0b385f nf_conntrack: avoid kernel pointer value leak in slab name ... Browse Code »

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: Linus Torvalds
Acked-by: Eric Dumazet
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller

Linus Torvalds
2016-05-15 03:04:43 +0800

10 May, 2016

1 commit

c047c3b1a netfilter: conntrack: remove uninitialized shadow variable ... Browse Code »

A recent commit introduced an unconditional use of an uninitialized
variable, as reported in this gcc warning:

net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_confirm':
net/netfilter/nf_conntrack_core.c:632:33: error: 'ctinfo' may be used uninitialized in this function [-Werror=maybe-uninitialized]
bytes = atomic64_read(&counter[CTINFO2DIR(ctinfo)].bytes);
^
net/netfilter/nf_conntrack_core.c:628:26: note: 'ctinfo' was declared here
enum ip_conntrack_info ctinfo;

The problem is that a local variable shadows the function parameter.
This removes the local variable, which looks like what Pablo originally
intended.

Signed-off-by: Arnd Bergmann
Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
Acked-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Arnd Bergmann
2016-05-10 13:04:04 +0800

09 May, 2016

1 commit

0c5366b3a netfilter: conntrack: use single slab cache ... Browse Code »

An earlier patch changed lookup side to also net_eq() namespaces after
obtaining a reference on the conntrack, so a single kmemcache can be used.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-09 22:45:50 +0800

05 May, 2016

11 commits

71d8c47fc netfilter: conntrack: introduce clash resolution on insertion race ... Browse Code »

This patch introduces nf_ct_resolve_clash() to resolve race condition on
conntrack insertions.

This is particularly a problem for connection-less protocols such as
UDP, with no initial handshake. Two or more packets may race to insert
the entry resulting in packet drops.

Another problematic scenario are packets enqueued to userspace via
NFQUEUE after the raw table, that make it easier to trigger this
race.

To resolve this, the idea is to reset the conntrack entry to the one
that won race. Packet and bytes counters are also merged.

The 'insert_failed' stats still accounts for this situation, after
this patch, the drop counter is bumped whenever we drop packets, so we
can watch for unresolved clashes.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-05-05 22:39:50 +0800
ba76738c0 netfilter: conntrack: introduce nf_ct_acct_update() ... Browse Code »

Introduce a helper function to update conntrack counters.
__nf_ct_kill_acct() was unnecessarily subtracting skb_network_offset()
that is expected to be zero from the ipv4/ipv6 hooks.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-05-05 22:39:49 +0800
4b4ceb9db netfilter: conntrack: __nf_ct_l4proto_find() always returns valid pointer ... Browse Code »

Remove unnecessary check for non-nul pointer in destroy_conntrack()
given that __nf_ct_l4proto_find() returns the generic protocol tracker
if the protocol is not supported.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-05-05 22:39:48 +0800
3e86638e9 netfilter: conntrack: consider ct netns in early_drop logic ... Browse Code »

When iterating, skip conntrack entries living in a different netns.

We could ignore netns and kill some other non-assured one, but it
has two problems:

- a netns can kill non-assured conntracks in other namespace
- we would start to 'over-subscribe' the affected/overlimit netns.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:48 +0800
56d52d489 netfilter: conntrack: use a single hashtable for all namespaces ... Browse Code »

We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the table.

Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
64bit system.

NAT bysrc and expectation hash is still per namespace, those will
changed too soon.

Future patch will also make conntrack object slab cache global again.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:47 +0800
1b8c8a9f6 netfilter: conntrack: make netns address part of hash ... Browse Code »

Once we place all conntracks into a global hash table we want them to be
spread across entire hash table, even if namespaces have overlapping ip
addresses.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:46 +0800
e0c7d4722 netfilter: conntrack: check netns when comparing conntrack objects ... Browse Code »

Once we place all conntracks in the same hash table we must also compare
the netns pointer to skip conntracks that belong to a different namespace.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:46 +0800
868043485 netfilter: conntrack: use nf_ct_key_equal() in more places ... Browse Code »

This prepares for upcoming change that places all conntracks into a
single, global table. For this to work we will need to also compare
net pointer during lookup. To avoid open-coding such check use the
nf_ct_key_equal helper and then later extend it to also consider net_eq.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:44 +0800
88b68bc52 netfilter: conntrack: don't attempt to iterate over empty table ... Browse Code »

Once we place all conntracks into same table iteration becomes more
costly because the table contains conntracks that we are not interested
in (belonging to other netns).

So don't bother scanning if the current namespace has no entries.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:44 +0800
5e3c61f98 netfilter: conntrack: fix lookup race during hash resize ... Browse Code »

When resizing the conntrack hash table at runtime via
echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with
the conntrack lookup path -- reads can happen in parallel and nothing
prevents readers from observing a the newly allocated hash but the old
size (or vice versa).

So access to hash[bucket] can trigger OOB read access in case the table got
expanded and we saw the new size but the old hash pointer (or it got shrunk
and we got new hash ptr but the size of the old and larger table):

kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107
[..]
Call Trace:
[] ? nf_conntrack_tuple_taken+0x12a/0xe90
[] ? nf_ct_invert_tuplepr+0x221/0x3a0
[] get_unique_tuple+0xfb3/0x2760

Use generation counter to obtain the address/length of the same table.

Also add a synchronize_net before freeing the old hash.
AFAICS, without it we might access ct_hash[bucket] after ct_hash has been
freed, provided that lockless reader got delayed by another event:

CPU1 CPU2
seq_begin
seq_retry
resize occurs
free oldhash
for_each(oldhash[size])

Note that resize is only supported in init_netns, it took over 2 minutes
of constant resizing+flooding to produce the warning, so this isn't a
big problem in practice.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:43 +0800
2cf123480 netfilter: conntrack: keep BH enabled during lookup ... Browse Code »

No need to disable BH here anymore:

stats are switched to _ATOMIC variant (== this_cpu_inc()), which
nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-05-05 22:39:43 +0800

29 Apr, 2016

1 commit

70d72b7e0 netfilter: conntrack: init all_locks to avoid debug warning ... Browse Code »

Else we get 'BUG: spinlock bad magic on CPU#' on resize when
spin lock debugging is enabled.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-29 17:27:10 +0800

25 Apr, 2016

2 commits

141658fb0 netfilter: conntrack: use get_random_once for conntrack hash seed ... Browse Code »

As earlier commit removed accessed to the hash from other files we can
also make it static.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-25 20:52:12 +0800
a3efd8120 netfilter: conntrack: move generation seqcnt out of netns_ct ... Browse Code »

We only allow rehash in init namespace, so we only use
init_ns.generation. And even if we would allow it, it makes no sense
as the conntrack locks are global; any ongoing rehash prevents insert/
delete.

So make this private to nf_conntrack_core instead.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-25 20:52:11 +0800

28 Mar, 2016

1 commit

ccd63c20f netfilter: nf_conntrack: Uses pr_fmt() for logging. ... Browse Code »

Uses pr_fmt() macro for debugging messages of nf_conntrack module.

Signed-off-by: Weongyo Jeong
Signed-off-by: Pablo Neira Ayuso

Weongyo Jeong
2016-03-28 18:56:07 +0800

15 Mar, 2016

1 commit

e39365be0 netfilter: nf_conntrack: consolidate lock/unlock into unlock_wait ... Browse Code »

The spin_lock()/spin_unlock() is synchronizing on the
nf_conntrack_locks_all_lock which is equivalent to
spin_unlock_wait() but the later should be more efficient.

Signed-off-by: Nicholas Mc Guire
Signed-off-by: Pablo Neira Ayuso

Nicholas Mc Guire
2016-03-15 08:10:42 +0800

01 Feb, 2016

1 commit

d93c6258e netfilter: conntrack: resched in nf_ct_iterate_cleanup ... Browse Code »

Ulrich reports soft lockup with following (shortened) callchain:

NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
__netif_receive_skb_core+0x6e4/0x774
process_backlog+0x94/0x160
net_rx_action+0x88/0x178
call_do_softirq+0x24/0x3c
do_softirq+0x54/0x6c
__local_bh_enable_ip+0x7c/0xbc
nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
atomic_notifier_call_chain+0x1c/0x2c
ipv6_del_addr+0x1bc/0x220 [ipv6]

Problem is that nf_ct_iterate_cleanup can run for a very long time
since it can be interrupted by softirq processing.
Moreover, atomic_notifier_call_chain runs with rcu readlock held.

So lets call cond_resched() in nf_ct_iterate_cleanup and defer
the call to a work queue for the atomic_notifier_call_chain case.

We also need another cond_resched in get_next_corpse, since we
have to deal with iter() always returning false, in that case
get_next_corpse will walk entire conntrack table.

Reported-by: Ulrich Weber
Tested-by: Ulrich Weber
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-02-01 07:15:26 +0800

20 Jan, 2016

1 commit

b16c29191 netfilter: nf_conntrack: use safer way to lock all buckets ... Browse Code »

When we need to lock all buckets in the connection hashtable we'd attempt to
lock 1024 spinlocks, which is way more preemption levels than supported by
the kernel. Furthermore, this behavior was hidden by checking if lockdep is
enabled, and if it was - use only 8 buckets(!).

Fix this by using a global lock and synchronize all buckets on it when we
need to lock them all. This is pretty heavyweight, but is only done when we
need to resize the hashtable, and that doesn't happen often enough (or at all).

Signed-off-by: Sasha Levin
Acked-by: Jesper Dangaard Brouer
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Sasha Levin
2016-01-20 21:15:31 +0800

12 Oct, 2015

1 commit

ae2d708ed netfilter: conntrack: fix crash on timeout object removal ... Browse Code »

The object and module refcounts are updated for each conntrack template,
however, if we delete the iptables rules and we flush the timeout
database, we may end up with invalid references to timeout object that
are just gone.

Resolve this problem by setting the timeout reference to NULL when the
custom timeout entry is removed from our base. This patch requires some
RCU trickery to ensure safe pointer handling.

This handling is similar to what we already do with conntrack helpers,
the idea is to avoid bumping the timeout object reference counter from
the packet path to avoid the cost of atomic ops.

Reported-by: Stephen Hemminger
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-10-12 23:04:34 +0800

19 Sep, 2015

1 commit

a31f1adc0 netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tuple ... Browse Code »

As gre does not have the srckey in the packet gre_pkt_to_tuple
needs to perform a lookup in it's per network namespace tables.

Pass in the proper network namespace to all pkt_to_tuple
implementations to ensure gre (and any similar protocols) can get this
right.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Pablo Neira Ayuso

Eric W. Biederman
2015-09-19 04:00:04 +0800

06 Sep, 2015

1 commit

53cfd053e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Conflicts:
include/net/netfilter/nf_conntrack.h

The conflict was an overlap between changing the type of the zone
argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free.

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net, they are:

1) Oneliner to restore maps in nf_tables since we support addressing registers
at 32 bits level.

2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n,
oneliner from Bernhard Thaler.

3) Out of bound access in ipset hash:net* set types, reported by Dave Jones'
KASan utility, patch from Jozsef Kadlecsik.

4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of
unnamed unions, patch from Elad Raz.

5) Add a workaround to address inconsistent endianess in the res_id field of
nfnetlink batch messages, reported by Florian Westphal.

6) Fix error paths of CT/synproxy since the conntrack template was moved to use
kmalloc, patch from Daniel Borkmann.

All of them look good to me to reach 4.2, I can route this to -stable myself
too, just let me know what you prefer.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-09-06 12:57:42 +0800

03 Sep, 2015

1 commit

62da98656 netfilter: nf_conntrack: make nf_ct_zone_dflt built-in ... Browse Code »

Fengguang reported, that some randconfig generated the following linker
issue with nf_ct_zone_dflt object involved:

[...]
CC init/version.o
LD init/built-in.o
net/built-in.o: In function `ipv4_conntrack_defrag':
nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
net/built-in.o: In function `ipv6_defrag':
nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
make: *** [vmlinux] Error 1

Given that configurations exist where we have a built-in part, which is
accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
area when netfilter is configured in general.

Therefore, split the more generic parts into a common header under
include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
section that already holds parts related to CONFIG_NF_CONNTRACK in the
netfilter core. This fixes the issue on my side.

Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions")
Reported-by: Fengguang Wu
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2015-09-03 07:32:56 +0800

01 Sep, 2015

1 commit

9cf94eab8 netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths ... Browse Code »

Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack
templates") migrated templates to the new allocator api, but forgot to
update error paths for them in CT and synproxy to use nf_ct_tmpl_free()
instead of nf_conntrack_free().

Due to that, memory is being freed into the wrong kmemcache, but also
we drop the per net reference count of ct objects causing an imbalance.

In Brad's case, this leads to a wrap-around of net->ct.count and thus
lets __nf_conntrack_alloc() refuse to create a new ct object:

[ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching
[ 10.810168] nf_conntrack: table full, dropping packet
[ 11.917416] r8169 0000:07:00.0 eth0: link up
[ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 12.815902] nf_conntrack: table full, dropping packet
[ 15.688561] nf_conntrack: table full, dropping packet
[ 15.689365] nf_conntrack: table full, dropping packet
[ 15.690169] nf_conntrack: table full, dropping packet
[ 15.690967] nf_conntrack: table full, dropping packet
[...]

With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs.
nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus,
to fix the problem, export and use nf_ct_tmpl_free() instead.

Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates")
Reported-by: Brad Jackson
Signed-off-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso

Daniel Borkmann
2015-09-01 18:15:08 +0800

21 Aug, 2015

1 commit

81bf1c64e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Resolve conflicts with conntrack template fixes.

Conflicts:
net/netfilter/nf_conntrack_core.c
net/netfilter/nf_synproxy_core.c
net/netfilter/xt_CT.c

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-08-21 12:09:05 +0800

18 Aug, 2015

2 commits

5e8018fc6 netfilter: nf_conntrack: add efficient mark to zone mapping ... Browse Code »

This work adds the possibility of deriving the zone id from the skb->mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.

Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.

In order to not add/expose an extra ct->status bit, the zone structure has
been extended to carry a flag for deriving the mark.

Signed-off-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso

Daniel Borkmann
2015-08-18 07:24:05 +0800
deedb5903 netfilter: nf_conntrack: add direction support for zones ... Browse Code »

This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.

The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.

As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.

Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.

If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).

Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.

Below a minimal, simplified collision example (script in [2]) with
netperf sessions:

+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:

iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o -j SNAT --to-source --random-fully

iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

conntrack dump from gateway netns:

netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns

tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1

tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1

tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2

Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.

I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.

[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/

Signed-off-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso

Daniel Borkmann
2015-08-18 07:22:50 +0800

11 Aug, 2015

1 commit

308ac9143 netfilter: nf_conntrack: push zone object into functions ... Browse Code »

This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.

No functional changes in this patch.

The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.

Signed-off-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso

Daniel Borkmann
2015-08-11 18:29:01 +0800

05 Aug, 2015

1 commit

f58e5aa7b netfilter: conntrack: Use flags in nf_ct_tmpl_alloc() ... Browse Code »

The flags were ignored for this function when it was introduced. Also
fix the style problem in kzalloc.

Fixes: 0838aa7fc (netfilter: fix netns dependencies with conntrack
templates)
Signed-off-by: Joe Stringer
Signed-off-by: Pablo Neira Ayuso

Joe Stringer
2015-08-05 16:56:43 +0800

30 Jul, 2015

1 commit

f0ad46218 netfilter: nf_conntrack: silence warning on falling back to vmalloc() ... Browse Code »

Since 88eab472ec21 ("netfilter: conntrack: adjust nf_conntrack_buckets default
value"), the hashtable can easily hit this warning. We got reports from users
that are getting this message in a quite spamming fashion, so better silence
this.

Signed-off-by: Pablo Neira Ayuso
Acked-by: Florian Westphal

Pablo Neira Ayuso
2015-07-30 18:32:55 +0800

20 Jul, 2015

1 commit

0838aa7fc netfilter: fix netns dependencies with conntrack templates ... Browse Code »

Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

ip netns add foo
ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso
Tested-by: Daniel Borkmann

Pablo Neira Ayuso
2015-07-20 20:58:19 +0800