Eric Lee / smarc-fsl-linux-kernel

13 Sep, 2016

1 commit

8e8118f89 netfilter: conntrack: remove packet hotpath stats ... Browse Code »

These counters sit in hot path and do show up in perf, this is especially
true for 'found' and 'searched' which get incremented for every packet
processed.

Information like

searched=212030105
new=623431
found=333613
delete=623327

does not seem too helpful nowadays:

- on busy systems found and searched will overflow every few hours
(these are 32bit integers), other more busy ones every few days.

- for debugging there are better methods, such as iptables' trace target,
the conntrack log sysctls. Nowadays we also have perf tool.

This removes packet path stat counters except those that
are expected to be 0 (or close to 0) on a normal system, e.g.
'insert_failed' (race happened) or 'invalid' (proto tracker rejects).

The insert stat is retained for the ctnetlink case.
The found stat is retained for the tuple-is-taken check when NAT has to
determine if it needs to pick a different source address.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-09-13 01:59:39 +0800

07 Sep, 2016

2 commits

c579a9e7d netfilter: gre: Use consistent GRE and PTTP header structure instead of the ones… ... Browse Code »

… defined by netfilter

There are two existing strutures which defines the GRE and PPTP header.
So use these two structures instead of the ones defined by netfilter to
keep consitent with other codes.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Gao Feng
2016-09-07 16:36:52 +0800
ecc6569f3 netfilter: gre: Use consistent GRE_* macros instead of ones defined by netfilter. ... Browse Code »

There are already some GRE_* macros in kernel, so it is unnecessary
to define these macros. And remove some useless macros

Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso

Gao Feng
2016-09-07 16:36:48 +0800

18 Aug, 2016

1 commit

aca300183 netfilter: nfnetlink_acct: report overquota to the right netns ... Browse Code »

We should report the over quota message to the right net namespace
instead of the init netns.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-18 06:38:23 +0800

19 Jul, 2016

1 commit

f4dc77713 netfilter: x_tables: speed up jump target validation ... Browse Code »

The dummy ruleset I used to test the original validation change was broken,
most rules were unreachable and were not tested by mark_source_chains().

In some cases rulesets that used to load in a few seconds now require
several minutes.

sample ruleset that shows the behaviour:

echo "*filter"
for i in $(seq 0 100000);do
printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 100000);do
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT

[ pipe result into iptables-restore ]

This ruleset will be about 74mbyte in size, with ~500k searches
though all 500k[1] rule entries. iptables-restore will take forever
(gave up after 10 minutes)

Instead of always searching the entire blob for a match, fill an
array with the start offsets of every single ipt_entry struct,
then do a binary search to check if the jump target is present or not.

After this change ruleset restore times get again close to what one
gets when reverting 36472341017529e (~3 seconds on my workstation).

[1] every user-defined rule gets an implicit RETURN, so we get
300k jumps + 100k userchains + 100k returns -> 500k rule entries

Fixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps")
Reported-by: Jeff Wu
Tested-by: Jeff Wu
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-07-19 03:35:23 +0800

03 Jul, 2016

1 commit

c37a2dfa6 netfilter: Convert FWINV<[foo]> macros and uses to NF_INVF ... Browse Code »

netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.

$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))

Consolidate these macros into a single NF_INVF macro.

Miscellanea:

o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibility

Signed-off-by: Joe Perches
Signed-off-by: Pablo Neira Ayuso

Joe Perches
2016-07-03 16:55:07 +0800

29 Apr, 2016

1 commit

92b4423e3 netfilter: fix IS_ERR_VALUE usage ... Browse Code »

This is a forward-port of the original patch from Andrzej Hajda,
he said:

"IS_ERR_VALUE should be used only with unsigned long type.
Otherwise it can work incorrectly. To achieve this function
xt_percpu_counter_alloc is modified to return unsigned long,
and its result is assigned to temporary variable to perform
error checking, before assigning to .pcnt field.

The patch follows conclusion from discussion on LKML [1][2].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2120927
[2]: http://permalink.gmane.org/gmane.linux.kernel/2150581"

Original patch from Andrzej is here:

http://patchwork.ozlabs.org/patch/582970/

This patch has clashed with input validation fixes for x_tables.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-04-29 17:02:33 +0800

24 Apr, 2016

2 commits

11afbff86 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, mostly from Florian Westphal to sort out the lack of sufficient
validation in x_tables and connlabel preparation patches to add
nf_tables support. They are:

1) Ensure we don't go over the ruleset blob boundaries in
mark_source_chains().

2) Validate that target jumps land on an existing xt_entry. This extra
sanitization comes with a performance penalty when loading the ruleset.

3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.

4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.

5) Make sure the minimal possible target size in x_tables.

6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.

7) Check that standard target size is valid.

8) More sanitization to ensure that the target_offset field is correct.

9) Add xt_check_entry_match() to validate that matches are well-formed.

10-12) Three patch to reduce the number of parameters in
translate_compat_table() for {arp,ip,ip6}tables by using a container
structure.

13) No need to return value from xt_compat_match_from_user(), so make
it void.

14) Consolidate translate_table() so it can be used by compat code too.

15) Remove obsolete check for compat code, so we keep consistent with
what was already removed in the native layout code (back in 2007).

16) Get rid of target jump validation from mark_source_chains(),
obsoleted by #2.

17) Introduce xt_copy_counters_from_user() to consolidate counter
copying, and use it from {arp,ip,ip6}tables.

18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
functions.

19) Move nf_connlabel_match() to xt_connlabel.

20) Skip event notification if connlabel did not change.

21) Update of nf_connlabels_get() to make the upcoming nft connlabel
support easier.

23) Remove spinlock to read protocol state field in conntrack.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-04-24 12:12:08 +0800
e9bbe898c libnl: nla_put_net64(): align on a 64-bit area ... Browse Code »

nla_data() is now aligned on a 64-bit area.

The temporary function nla_put_be64_32bit() is removed in this patch.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2016-04-24 08:13:24 +0800

14 Apr, 2016

5 commits

d7591f0c4 netfilter: x_tables: introduce and use xt_copy_counters_from_user ... Browse Code »

The three variants use same copy&pasted code, condense this into a
helper and use that.

Make sure info.name is 0-terminated.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-14 06:30:41 +0800
0188346f2 netfilter: x_tables: xt_compat_match_from_user doesn't need a retval ... Browse Code »

Always returned 0.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-14 06:30:40 +0800
ce683e5f9 netfilter: x_tables: check for bogus target offset ... Browse Code »

We're currently asserting that targetoff + targetsize = sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.

We also need the e->elems pointer in a followup change to validate matches.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-14 06:30:37 +0800
fc1221b3a netfilter: x_tables: add compat version of xt_check_entry_offsets ... Browse Code »

32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-14 06:30:36 +0800
7d35812c3 netfilter: x_tables: add and use xt_check_entry_offsets ... Browse Code »

Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.

Unfortunately these checks are not sufficient.

To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-04-14 06:30:35 +0800

28 Mar, 2016

1 commit

596cf3fe5 netfilter: ipset: fix race condition in ipset save, swap and delete ... Browse Code »

This fix adds a new reference counter (ref_netlink) for the struct ip_set.
The other reference counter (ref) can be swapped out by ip_set_swap and we
need a separate counter to keep track of references for netlink events
like dump. Using the same ref counter for dump causes a race condition
which can be demonstrated by the following script:

ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \
counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \
counters

ipset save &

ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3 /* will crash the machine */

Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. With this fix in place swap will not succeed because ipset save
still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).

Both delete and swap will error out if ref_netlink != 0 on the set.

Note: The changes to *_head functions is because previously we would
increment ref whenever we called these functions, we don't do that
anymore.

Reviewed-by: Joshua Hunt
Signed-off-by: Vishwanath Pai
Signed-off-by: Jozsef Kadlecsik
Signed-off-by: Pablo Neira Ayuso

Vishwanath Pai
2016-03-28 23:57:45 +0800

03 Mar, 2016

1 commit

b9e69e127 netfilter: xtables: don't hook tables by default ... Browse Code »

delay hook registration until the table is being requested inside a
namespace.

Historically, a particular table (iptables mangle, ip6tables filter, etc)
was registered on module load.

When netns support was added to iptables only the ip/ip6tables ruleset was
made namespace aware, not the actual hook points.

This means f.e. that when ipt_filter table/module is loaded on a system,
then each namespace on that system has an (empty) iptables filter ruleset.

In other words, if a namespace sends a packet, such skb is 'caught' by
netfilter machinery and fed to hooking points for that table (i.e. INPUT,
FORWARD, etc).

Thanks to Eric Biederman, hooks are no longer global, but per namespace.

This means that we can avoid allocation of empty ruleset in a namespace and
defer hook registration until we need the functionality.

We register a tables hook entry points ONLY in the initial namespace.
When an iptables get/setockopt is issued inside a given namespace, we check
if the table is found in the per-namespace list.

If not, we attempt to find it in the initial namespace, and, if found,
create an empty default table in the requesting namespace and register the
needed hooks.

Hook points are destroyed only once namespace is deleted, there is no
'usage count' (it makes no sense since there is no 'remove table' operation
in xtables api).

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-03-03 03:05:24 +0800

19 Feb, 2016

1 commit

905f0a739 nfnetlink: remove nfnetlink_alloc_skb ... Browse Code »

Following mmapped netlink removal this code can be simplified by
removing the alloc wrapper.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-02-19 00:42:19 +0800

29 Dec, 2015

2 commits

5913beaf0 netfilter: nfnetlink: pass down netns pointer to commit() and abort() callbacks ... Browse Code »

Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-12-29 01:43:15 +0800
7b8002a15 netfilter: nfnetlink: pass down netns pointer to call() and call_rcu() ... Browse Code »

Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2015-12-29 01:41:41 +0800

19 Dec, 2015

1 commit

59ce9670c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:

1) Two patches to include dependencies in our netfilter userspace
headers to resolve compilation problems, from Mikko Rapeli.

2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

3) Remove duplicate include in the netfilter reject infrastructure,
from Stephen Hemminger.

4) Two patches to simplify the netfilter defragmentation code for IPv6,
patch from Florian Westphal.

5) Fix root ownership of /proc/net netfilter for unpriviledged net
namespaces, from Philip Whineray.

6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

7) Add mangling support to our nf_tables payload expression, from
Patrick McHardy.

8) Introduce a new netlink-based tracing infrastructure for nf_tables,
from Florian Westphal.

9) Change setter functions in nfnetlink_log to be void, from
Rami Rosen.

10) Add netns support to the cttimeout infrastructure.

11) Add cgroup2 support to iptables, from Tejun Heo.

12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

13) Add support for mangling pkttype in the nf_tables meta expression,
also from Florian.

BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-12-19 04:37:42 +0800

10 Dec, 2015

1 commit

633c9a840 netfilter: nfnetlink: avoid recurrent netns lookups in call_batch ... Browse Code »

Pass the net pointer to the call_batch callback functions so we can skip
recurrent lookups.

Signed-off-by: Pablo Neira Ayuso
Tested-by: Arturo Borrero Gonzalez

Pablo Neira Ayuso
2015-12-10 20:49:24 +0800

24 Nov, 2015

1 commit

f7ccdb96f netfilter: nf_ct_sctp: move ip_ct_sctp away from UAPI ... Browse Code »

ip_ct_sctp is an internal structure, embedded by the union
nf_conntrack_proto to store sctp-specific information at conntrack
entries. It has no business with UAPI.

This patch moves it from UAPI to a saner place, together with similar
structs for other protocols.

Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Neil Horman
Signed-off-by: Pablo Neira Ayuso

Marcelo Ricardo Leitner
2015-11-24 00:54:42 +0800

07 Nov, 2015

1 commit

95ad1f4a9 netfilter: ipset: Fix extension alignment ... Browse Code »

The data extensions in ipset lacked the proper memory alignment and
thus could lead to kernel crash on several architectures. Therefore
the structures have been reorganized and alignment attributes added
where needed. The patch was tested on armv7h by Gerhard Wiesinger and
on x86_64, sparc64 by Jozsef Kadlecsik.

Reported-by: Gerhard Wiesinger
Tested-by: Gerhard Wiesinger
Tested-by: Jozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2015-11-07 18:21:47 +0800

09 Oct, 2015

1 commit

875e08294 net/nfnetlink: lockdep_nfnl_is_held can be boolean ... Browse Code »

This patch makes lockdep_nfnl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai
Signed-off-by: David S. Miller

Yaowei Bai
2015-10-09 22:49:00 +0800

19 Sep, 2015

1 commit

156c196f6 netfilter: x_tables: Pass struct net in xt_action_param ... Browse Code »

As xt_action_param lives on the stack this does not bloat any
persistent data structures.

This is a first step in making netfilter code that needs to know
which network namespace it is executing in simpler.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Pablo Neira Ayuso

Eric W. Biederman
2015-09-19 03:58:14 +0800

03 Sep, 2015

1 commit

62da98656 netfilter: nf_conntrack: make nf_ct_zone_dflt built-in ... Browse Code »

Fengguang reported, that some randconfig generated the following linker
issue with nf_ct_zone_dflt object involved:

[...]
CC init/version.o
LD init/built-in.o
net/built-in.o: In function `ipv4_conntrack_defrag':
nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
net/built-in.o: In function `ipv6_defrag':
nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
make: *** [vmlinux] Error 1

Given that configurations exist where we have a built-in part, which is
accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
area when netfilter is configured in general.

Therefore, split the more generic parts into a common header under
include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
section that already holds parts related to CONFIG_NF_CONNTRACK in the
netfilter core. This fixes the issue on my side.

Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions")
Reported-by: Fengguang Wu
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2015-09-03 07:32:56 +0800

07 Aug, 2015

1 commit

3499abb24 netfilter: nfacct: per network namespace support ... Browse Code »

- Move the nfnl_acct_list into the network namespace, initialize
and destroy it per namespace
- Keep track of refcnt on nfacct objects, the old logic does not
longer work with a per namespace list
- Adjust xt_nfacct to pass the namespace when registring objects

Signed-off-by: Andreas Schultz
Signed-off-by: Pablo Neira Ayuso

Andreas Schultz
2015-08-07 17:50:56 +0800

16 Jul, 2015

2 commits

dcebd3153 netfilter: add and use jump label for xt_tee ... Browse Code »

Don't bother testing if we need to switch to alternate stack
unless TEE target is used.

Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-07-16 00:18:06 +0800
7814b6ec6 netfilter: xtables: don't save/restore jumpstack offset ... Browse Code »

In most cases there is no reentrancy into ip/ip6tables.

For skbs sent by REJECT or SYNPROXY targets, there is one level
of reentrancy, but its not relevant as those targets issue an absolute
verdict, i.e. the jumpstack can be clobbered since its not used
after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).

So the only special case where it is relevant is the TEE target, which
returns XT_CONTINUE.

This patch changes ip(6)_do_table to always use the jump stack starting
from 0.

When we detect we're operating on an skb sent via TEE (percpu
nf_skb_duplicated is 1) we switch to an alternate stack to leave
the original one alone.

Since there is no TEE support for arptables, it doesn't need to
test if tee is active.

The jump stack overflow tests are no longer needed as well --
since ->stacksize is the largest call depth we cannot exceed it.

A much better alternative to the external jumpstack would be to just
declare a jumps[32] stack on the local stack frame, but that would mean
we'd have to reject iptables rulesets that used to work before.

Another alternative would be to start rejecting rulesets with a larger
call depth, e.g. 1000 -- in this case it would be feasible to allocate the
entire stack in the percpu area which would avoid one dereference.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-07-16 00:18:06 +0800

19 Jun, 2015

1 commit

dcb8f5c81 netfilter: xtables: fix warnings on 32bit platforms ... Browse Code »

On 32bit archs gcc complains due to cast from void* to u64.
Add intermediate casts to long to silence these warnings.

include/linux/netfilter/x_tables.h:376:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
include/linux/netfilter/x_tables.h:384:15: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:391:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:400:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Fixes: 71ae0dff02d756e ("netfilter: xtables: use percpu rule counters")
Reported-by: kbuild test robot
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-06-19 03:14:33 +0800

18 Jun, 2015

1 commit

a1a56aaa0 netfilter: x_tables: align per cpu xt_counter ... Browse Code »

Let's force a 16 bytes alignment on xt_counter percpu allocations,
so that bytes and packets sit in same cache line.

xt_counter being exported to user space, we cannot add __align(16) on
the structure itself.

Signed-off-by: Eric Dumazet
Cc: Florian Westphal
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Eric Dumazet
2015-06-18 19:05:45 +0800

16 Jun, 2015

1 commit

711bdde6a netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. ... Browse Code »

After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore :
Only one copy of table is kept, instead of one copy per cpu.

We also can avoid a dereference if we put table data right after
xt_table_info. It reduces register pressure and helps compiler.

Then, we attempt a kmalloc() if total size is under order-3 allocation,
to reduce TLB pressure, as in many cases, rules fit in 32 KB.

Signed-off-by: Eric Dumazet
Cc: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Eric Dumazet
2015-06-16 02:19:20 +0800

14 Jun, 2015

5 commits

ca0f6a5cd netfilter: ipset: Fix coding styles reported by checkpatch.pl ... Browse Code »

Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2015-06-14 16:40:18 +0800
b57b2d1fa netfilter: ipset: Prepare the ipset core to use RCU at set level ... Browse Code »

Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking
accordingly. Convert the comment extension into an rcu-avare object. Also,
simplify the timeout routines.

Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2015-06-14 16:40:16 +0800
c4c997839 netfilter: ipset: Fix parallel resizing and listing of the same set ... Browse Code »

When elements added to a hash:* type of set and resizing triggered,
parallel listing could start to list the original set (before resizing)
and "continue" with listing the new set. Fix it by references and
using the original hash table for listing. Therefore the destroying of
the original hash table may happen from the resizing or listing functions.

Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2015-06-14 16:40:15 +0800
f690cbaed netfilter: ipset: Fix cidr handling for hash:*net* types ... Browse Code »

Commit "Simplify cidr handling for hash:*net* types" broke the cidr
handling for the hash:*net* types when the sets were used by the SET
target: entries with invalid cidr values were added to the sets.
Reported by Jonathan Johnson.

Testsuite entry is added to verify the fix.

Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2015-06-14 16:40:14 +0800
aaeb6e24f netfilter: ipset: Use MSEC_PER_SEC consistently ... Browse Code »

Signed-off-by: Jozsef Kadlecsik

Jozsef Kadlecsik
2015-06-14 16:40:12 +0800

12 Jun, 2015

2 commits

482cfc318 netfilter: xtables: avoid percpu ruleset duplication ... Browse Code »

We store the rule blob per (possible) cpu. Unfortunately this means we can
waste lot of memory on big smp machines. ipt_entry structure ('rule head')
is 112 byte, so e.g. with maxcpu=64 one single rule eats
close to 8k RAM.

Since previous patch made counters percpu it appears there is nothing
left in the rule blob that needs to be percpu.

On my test system (144 possible cpus, 400k dummy rules) this
change saves close to 9 Gigabyte of RAM.

Reported-by: Marcelo Ricardo Leitner
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-06-12 20:27:10 +0800
71ae0dff0 netfilter: xtables: use percpu rule counters ... Browse Code »

The binary arp/ip/ip6tables ruleset is stored per cpu.

The only reason left as to why we need percpu duplication are the rule
counters embedded into ipt_entry et al -- since each cpu has its own copy
of the rules, all counters can be lockless.

The downside is that the more cpus are supported, the more memory is
required. Rules are not just duplicated per online cpu but for each
possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
not for the e.g. 64 cores present.

To save some memory and also improve utilization of shared caches it
would be preferable to only store the rule blob once.

So we first need to separate counters and the rule blob.

Instead of using entry->counters, allocate this percpu and store the
percpu address in entry->counters.pcnt on CONFIG_SMP.

This change makes no sense as-is; it is merely an intermediate step to
remove the percpu duplication of the rule set in a followup patch.

Suggested-by: Eric Dumazet
Acked-by: Jesper Dangaard Brouer
Reported-by: Marcelo Ricardo Leitner
Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-06-12 20:27:09 +0800

16 May, 2015

1 commit

55917a21d netfilter: x_tables: add context to know if extension runs from nft_compat ... Browse Code »

Currently, we have four xtables extensions that cannot be used from the
xt over nft compat layer. The problem is that they need real access to
the full blown xt_entry to validate that the rule comes with the right
dependencies. This check was introduced to overcome the lack of
sufficient userspace dependency validation in iptables.

To resolve this problem, this patch introduces a new field to the
xt_tgchk_param structure that tell us if the extension is run from
nft_compat context.

The three affected extensions are:

1) CLUSTERIP, this target has been superseded by xt_cluster. So just
bail out by returning -EINVAL.

2) TCPMSS. Relax the checking when used from nft_compat. If used with
the wrong configuration, it will corrupt !syn packets by adding TCP
MSS option.

3) ebt_stp. Relax the check to make sure it uses the reserved
destination MAC address for STP.

Signed-off-by: Pablo Neira Ayuso
Tested-by: Arturo Borrero Gonzalez

Pablo Neira Ayuso
2015-05-16 02:14:07 +0800