18 Feb, 2014

1 commit

  • Quoting Andrey Vagin:
    When a conntrack is created by kernel, it is initialized (sets
    IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
    is added in hashes (__nf_conntrack_hash_insert), so one conntract
    can't be initialized from a few threads concurrently.

    ctnetlink can add an uninitialized conntrack (w/o
    IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
    this conntrack and start initialize it concurrently. It's dangerous,
    because BUG can be triggered from nf_nat_setup_info.

    Fix this race by always setting up nat, even if no CTA_NAT_ attribute
    was requested before inserting the ct into the hash table. In absence
    of CTA_NAT_ attribute, a null binding is created.

    This alters current behaviour: Before this patch, the first packet
    matching the newly injected conntrack would be run through the nat
    table since nf_nat_initialized() returns false. IOW, this forces
    ctnetlink users to specify the desired nat transformation on ct
    creation time.

    Thanks for Florian Westphal, this patch is based on his original
    patch to address this problem, including this patch description.

    Reported-By: Andrey Vagin
    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Florian Westphal

    Pablo Neira Ayuso
     

04 Jan, 2014

1 commit

  • We currently use prandom_u32() for allocation of ports in tcp bind(0)
    and udp code. In case of plain SNAT we try to keep the ports as is
    or increment on collision.

    SNAT --random mode does use per-destination incrementing port
    allocation. As a recent paper pointed out in [1] that this mode of
    port allocation makes it possible to an attacker to find the randomly
    allocated ports through a timing side-channel in a socket overloading
    attack conducted through an off-path attacker.

    So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
    in regard to the attack described in this paper. As we need to keep
    compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
    that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
    selection algorithm with a simple prandom_u32() in order to mitigate
    this attack vector. Note that the lfsr113's internal state is
    periodically reseeded by the kernel through a local secure entropy
    source.

    More details can be found in [1], the basic idea is to send bursts
    of packets to a socket to overflow its receive queue and measure
    the latency to detect a possible retransmit when the port is found.
    Because of increasing ports to given destination and port, further
    allocations can be predicted. This information could then be used by
    an attacker for e.g. for cache-poisoning, NS pinning, and degradation
    of service attacks against DNS servers [1]:

    The best defense against the poisoning attacks is to properly
    deploy and validate DNSSEC; DNSSEC provides security not only
    against off-path attacker but even against MitM attacker. We hope
    that our results will help motivate administrators to adopt DNSSEC.
    However, full DNSSEC deployment make take significant time, and
    until that happens, we recommend short-term, non-cryptographic
    defenses. We recommend to support full port randomisation,
    according to practices recommended in [2], and to avoid
    per-destination sequential port allocation, which we show may be
    vulnerable to derandomisation attacks.

    Joint work between Hannes Frederic Sowa and Daniel Borkmann.

    [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
    [2] http://arxiv.org/pdf/1205.5190v1.pdf

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     

14 Oct, 2013

1 commit


28 Aug, 2013

1 commit

  • Split out sequence number adjustments from NAT and move them to the conntrack
    core to make them usable for SYN proxying. The sequence number adjustment
    information is moved to a seperate extend. The extend is added to new
    conntracks when a NAT mapping is set up for a connection using a helper.

    As a side effect, this saves 24 bytes per connection with NAT in the common
    case that a connection does not have a helper assigned.

    Signed-off-by: Patrick McHardy
    Tested-by: Martin Topholm
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

09 Aug, 2013

1 commit


25 Apr, 2013

2 commits


23 Apr, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    drivers/net/ethernet/intel/igb/igb_main.c
    drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
    include/net/scm.h
    net/batman-adv/routing.c
    net/ipv4/tcp_input.c

    The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
    cleanup in net-next to now pass cred structs around.

    The be2net driver had a bug fix in 'net' that overlapped with the VLAN
    interface changes by Patrick McHardy in net-next.

    An IGB conflict existed because in 'net' the build_skb() support was
    reverted, and in 'net-next' there was a comment style fix within that
    code.

    Several batman-adv conflicts were resolved by making sure that all
    calls to batadv_is_my_mac() are changed to have a new bat_priv first
    argument.

    Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
    rewrite in 'net-next', mostly overlapping changes.

    Thanks to Stephen Rothwell and Antonio Quartulli for help with several
    of these merge resolutions.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Apr, 2013

1 commit

  • following oops was reported:
    RIP: 0010:[] [] nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat]
    RSP: 0018:ffff880202c63d40 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8801ac7bec28 RCX: ffff8801d0eedbe0
    RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffffa03265b8
    [..]
    Call Trace:
    [..]
    [] destroy_conntrack+0xbd/0x110 [nf_conntrack]

    Happens when a conntrack timeout expires right after first part
    of the nat cleanup has completed (bysrc hash removal), but before
    part 2 has completed (re-initialization of nat area).

    [ destroy callback tries to delete bysrc again ]

    Patrick suggested to just remove the affected conntracks -- the
    connections won't work properly anyway without nat transformation.

    So, lets do that.

    Reported-by: CAI Qian
    Cc: Patrick McHardy
    Signed-off-by: Florian Westphal
    Acked-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

08 Apr, 2013

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

21 Sep, 2012

2 commits

  • hlist walk in find_appropriate_src() is not protected anymore by rcu_read_lock(),
    so rcu_read_unlock() is unnecessary if in_range() matches.

    This bug was added in (c7232c9 netfilter: add protocol independent NAT core).

    Signed-off-by: Ulrich Weber
    Signed-off-by: Pablo Neira Ayuso

    Ulrich Weber
     
  • When unloading a protocol module nf_ct_iterate_cleanup() is used to
    remove all conntracks using the protocol from the bysource hash and
    clean their NAT sections. Since the conntrack isn't actually killed,
    the NAT callback is invoked twice, once for each direction, which
    causes an oops when trying to delete it from the bysource hash for
    the second time.

    The same oops can also happen when removing both an L3 and L4 protocol
    since the cleanup function doesn't check whether the conntrack has
    already been cleaned up.

    Pid: 4052, comm: modprobe Not tainted 3.6.0-rc3-test-nat-unload-fix+ #32 Red Hat KVM
    RIP: 0010:[] [] nf_nat_proto_clean+0x73/0xd0 [nf_nat]
    RSP: 0018:ffff88007808fe18 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8800728550c0 RCX: ffff8800756288b0
    RDX: dead000000200200 RSI: ffff88007808fe88 RDI: ffffffffa002f208
    RBP: ffff88007808fe28 R08: ffff88007808e000 R09: 0000000000000000
    R10: dead000000200200 R11: dead000000100100 R12: ffffffff81c6dc00
    R13: ffff8800787582b8 R14: ffff880078758278 R15: ffff88007808fe88
    FS: 00007f515985d700(0000) GS:ffff88007cd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f515986a000 CR3: 000000007867a000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 4052, threadinfo ffff88007808e000, task ffff8800756288b0)
    Stack:
    ffff88007808fe68 ffffffffa002c290 ffff88007808fe78 ffffffff815614e3
    ffffffff00000000 00000aeb00000246 ffff88007808fe68 ffffffff81c6dc00
    ffff88007808fe88 ffffffffa00358a0 0000000000000000 000000000040f5b0
    Call Trace:
    [] ? nf_nat_net_exit+0x50/0x50 [nf_nat]
    [] nf_ct_iterate_cleanup+0xc3/0x170
    [] nf_nat_l3proto_unregister+0x8a/0x100 [nf_nat]
    [] ? compat_prepare_timeout+0x13/0xb0
    [] nf_nat_l3proto_ipv4_exit+0x10/0x23 [nf_nat_ipv4]
    ...

    To fix this,

    - check whether the conntrack has already been cleaned up in
    nf_nat_proto_clean

    - change nf_ct_iterate_cleanup() to only invoke the callback function
    once for each conntrack (IP_CT_DIR_ORIGINAL).

    The second change doesn't affect other callers since when conntracks are
    actually killed, both directions are removed from the hash immediately
    and the callback is already only invoked once. If it is not killed, the
    second callback invocation will always return the same decision not to
    kill it.

    Reported-by: Jesper Dangaard Brouer
    Signed-off-by: Patrick McHardy
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

10 Sep, 2012

1 commit


30 Aug, 2012

2 commits