Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

18 Feb, 2014

1 commit

0eba801b6 netfilter: ctnetlink: force null nat binding on insert ... Browse Code »
1

Quoting Andrey Vagin:
When a conntrack is created by kernel, it is initialized (sets
IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
is added in hashes (__nf_conntrack_hash_insert), so one conntract
can't be initialized from a few threads concurrently.

ctnetlink can add an uninitialized conntrack (w/o
IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
this conntrack and start initialize it concurrently. It's dangerous,
because BUG can be triggered from nf_nat_setup_info.

Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.

This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false. IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.

Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.

Reported-By: Andrey Vagin
Signed-off-by: Pablo Neira Ayuso
Acked-by: Florian Westphal

Pablo Neira Ayuso
2014-02-18 07:13:51 +0800

04 Jan, 2014

1 commit

34ce32401 netfilter: nf_nat: add full port randomization support ... Browse Code »

We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

The best defense against the poisoning attacks is to properly
deploy and validate DNSSEC; DNSSEC provides security not only
against off-path attacker but even against MitM attacker. We hope
that our results will help motivate administrators to adopt DNSSEC.
However, full DNSSEC deployment make take significant time, and
until that happens, we recommend short-term, non-cryptographic
defenses. We recommend to support full port randomisation,
according to practices recommended in [2], and to avoid
per-destination sequential port allocation, which we show may be
vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

[1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
[2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Daniel Borkmann
Signed-off-by: Pablo Neira Ayuso

Daniel Borkmann
2014-01-04 06:41:26 +0800

14 Oct, 2013

1 commit

f59cb0453 netfilter: nf_nat: move alloc_null_binding to nf_nat_core.c ... Browse Code »

Similar to nat_decode_session, alloc_null_binding is needed for both
ip_tables and nf_tables, so move it to nf_nat_core.c. This change
is required by nf_tables.

This is an adapted version of the original patch from Patrick McHardy.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2013-10-14 17:29:39 +0800

28 Aug, 2013

1 commit

41d73ec05 netfilter: nf_conntrack: make sequence number adjustments usuable without NAT ... Browse Code »

Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy
Tested-by: Martin Topholm
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Pablo Neira Ayuso

Patrick McHardy
2013-08-28 06:26:48 +0800

09 Aug, 2013

1 commit

c655bc689 netfilter: nf_conntrack: don't send destroy events from iterator ... Browse Code »

Let nf_ct_delete handle delivery of the DESTROY event.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2013-08-09 18:03:33 +0800

25 Apr, 2013

2 commits

d3734b049 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
The following patchset contains fixes for recently applied
Netfilter/IPVS updates to the net-next tree, most relevantly
they are:

* Fix sparse warnings introduced in the RCU conversion, from
Julian Anastasov.

* Fix wrong endianness in the size field of IPVS sync messages,
from Simon Horman.

* Fix missing if checking in nf_xfrm_me_harder, from Dan Carpenter.

* Fix off by one access in the IPVS SCTP tracking code, again from
Dan Carpenter.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-04-25 12:53:40 +0800
e7e6f6300 netfilter: nf_nat: missing condition in nf_xfrm_me_harder() ... Browse Code »

This if statement was accidentally dropped in (aaa795a netfilter:
nat: propagate errors from xfrm_me_harder()) so now it returns
unconditionally.

Signed-off-by: Dan Carpenter
Signed-off-by: Pablo Neira Ayuso

Dan Carpenter
2013-04-25 07:58:16 +0800

23 Apr, 2013

1 commit

6e0895c2e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c

The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.

The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.

An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.

Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.

Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.

Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.

Signed-off-by: David S. Miller

David S. Miller
2013-04-23 08:32:51 +0800

12 Apr, 2013

1 commit

c2d421e17 netfilter: nf_nat: fix race when unloading protocol modules ... Browse Code »
19

following oops was reported:
RIP: 0010:[] [] nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat]
RSP: 0018:ffff880202c63d40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801ac7bec28 RCX: ffff8801d0eedbe0
RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffffa03265b8
[..]
Call Trace:
[..]
[] destroy_conntrack+0xbd/0x110 [nf_conntrack]

Happens when a conntrack timeout expires right after first part
of the nat cleanup has completed (bysrc hash removal), but before
part 2 has completed (re-initialization of nat area).

[ destroy callback tries to delete bysrc again ]

Patrick suggested to just remove the affected conntracks -- the
connections won't work properly anyway without nat transformation.

So, lets do that.

Reported-by: CAI Qian
Cc: Patrick McHardy
Signed-off-by: Florian Westphal
Acked-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2013-04-12 17:46:31 +0800

08 Apr, 2013

1 commit

aaa795ad2 netfilter: nat: propagate errors from xfrm_me_harder() ... Browse Code »

Propagate errors from ip_xfrm_me_harder() instead of returning EPERM in
all cases.

Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso

Patrick McHardy
2013-04-08 18:34:01 +0800

28 Feb, 2013

1 commit

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »
26

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800

21 Sep, 2012

2 commits

136251d02 netfilter: nf_nat: remove obsolete rcu_read_unlock call ... Browse Code »

hlist walk in find_appropriate_src() is not protected anymore by rcu_read_lock(),
so rcu_read_unlock() is unnecessary if in_range() matches.

This bug was added in (c7232c9 netfilter: add protocol independent NAT core).

Signed-off-by: Ulrich Weber
Signed-off-by: Pablo Neira Ayuso

Ulrich Weber
2012-09-21 18:09:25 +0800
b0cdb1d9a netfilter: nf_nat: fix oops when unloading protocol modules ... Browse Code »

When unloading a protocol module nf_ct_iterate_cleanup() is used to
remove all conntracks using the protocol from the bysource hash and
clean their NAT sections. Since the conntrack isn't actually killed,
the NAT callback is invoked twice, once for each direction, which
causes an oops when trying to delete it from the bysource hash for
the second time.

The same oops can also happen when removing both an L3 and L4 protocol
since the cleanup function doesn't check whether the conntrack has
already been cleaned up.

Pid: 4052, comm: modprobe Not tainted 3.6.0-rc3-test-nat-unload-fix+ #32 Red Hat KVM
RIP: 0010:[] [] nf_nat_proto_clean+0x73/0xd0 [nf_nat]
RSP: 0018:ffff88007808fe18 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800728550c0 RCX: ffff8800756288b0
RDX: dead000000200200 RSI: ffff88007808fe88 RDI: ffffffffa002f208
RBP: ffff88007808fe28 R08: ffff88007808e000 R09: 0000000000000000
R10: dead000000200200 R11: dead000000100100 R12: ffffffff81c6dc00
R13: ffff8800787582b8 R14: ffff880078758278 R15: ffff88007808fe88
FS: 00007f515985d700(0000) GS:ffff88007cd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f515986a000 CR3: 000000007867a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 4052, threadinfo ffff88007808e000, task ffff8800756288b0)
Stack:
ffff88007808fe68 ffffffffa002c290 ffff88007808fe78 ffffffff815614e3
ffffffff00000000 00000aeb00000246 ffff88007808fe68 ffffffff81c6dc00
ffff88007808fe88 ffffffffa00358a0 0000000000000000 000000000040f5b0
Call Trace:
[] ? nf_nat_net_exit+0x50/0x50 [nf_nat]
[] nf_ct_iterate_cleanup+0xc3/0x170
[] nf_nat_l3proto_unregister+0x8a/0x100 [nf_nat]
[] ? compat_prepare_timeout+0x13/0xb0
[] nf_nat_l3proto_ipv4_exit+0x10/0x23 [nf_nat_ipv4]
...

To fix this,

- check whether the conntrack has already been cleaned up in
nf_nat_proto_clean

- change nf_ct_iterate_cleanup() to only invoke the callback function
once for each conntrack (IP_CT_DIR_ORIGINAL).

The second change doesn't affect other callers since when conntracks are
actually killed, both directions are removed from the hash immediately
and the callback is already only invoked once. If it is not killed, the
second callback invocation will always return the same decision not to
kill it.

Reported-by: Jesper Dangaard Brouer
Signed-off-by: Patrick McHardy
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Pablo Neira Ayuso

Patrick McHardy
2012-09-21 17:35:18 +0800

10 Sep, 2012

1 commit

5693d68df netfilter: nf_nat: fix out-of-bounds access in address selection ... Browse Code »

include/linux/jhash.h:138:16: warning: array subscript is above array bounds
[jhash2() expects the number of u32 in the key]

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2012-09-10 02:18:55 +0800

30 Aug, 2012

2 commits

58a317f10 netfilter: ipv6: add IPv6 NAT support ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:17 +0800
c7232c997 netfilter: add protocol independent NAT core ... Browse Code »

Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:14 +0800