29 Aug, 2020

1 commit

  • Its possible that we have more than one packet with the same ct tuple
    simultaneously, e.g. when an application emits n packets on same UDP
    socket from multiple threads.

    NAT rules might be applied to those packets. With the right set of rules,
    n packets will be mapped to m destinations, where at least two packets end
    up with the same destination.

    When this happens, the existing clash resolution may merge the skb that
    is processed after the first has been received with the identical tuple
    already in hash table.

    However, its possible that this identical tuple is a NAT_CLASH tuple.
    In that case the second skb will be sent, but no reply can be received
    since the reply that is processed first removes the NAT_CLASH tuple.

    Do not auto-delete, this gives a 1 second window for replies to be passed
    back to originator.

    Packets that are coming later (udp stream case) will not be affected:
    they match the original ct entry, not a NAT_CLASH one.

    Also prevent NAT_CLASH entries from getting offloaded.

    Fixes: 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Feb, 2020

1 commit

  • This patch further relaxes the need to drop an skb due to a clash with
    an existing conntrack entry.

    Current clash resolution handles the case where the clash occurs between
    two identical entries (distinct nf_conn objects with same tuples), i.e.:

    Original Reply
    existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 10.8.8.8:53 10.2.3.4:42 _nfct point to the existing one. The skb can then be
    processed normally just as if the clash would not have existed in the
    first place.

    For other clashes, the skb needs to be dropped.
    This frequently happens with DNS resolvers that send A and AAAA queries
    back-to-back when NAT rules are present that cause packets to get
    different DNAT transformations applied, for example:

    -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
    -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353

    In this case the A or AAAA query is dropped which incurs a costly
    delay during name resolution.

    This patch also allows this collision type:
    Original Reply
    existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 10.8.8.8:53 10.2.3.4:42 10.8.8.8:53 (A)
    2. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
    3. Apply DNAT, reply changed to 10.0.0.6
    4. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
    5. Apply DNAT, reply changed to 10.0.0.7
    6. confirm/commit to conntrack table, no collisions
    7. commit clashing entry

    Reply comes in:

    10.2.3.4:42 Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
    10.2.3.4:42 Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
    The conntrack entry is deleted from table, as it has the NAT_CLASH
    bit set.

    In case of a retransmit from ORIGINAL dir, all further packets will get
    the DNAT transformation to 10.0.0.6.

    I tried to come up with other solutions but they all have worse
    problems.

    Alternatives considered were:
    1. Confirm ct entries at allocation time, not in postrouting.
    a. will cause uneccesarry work when the skb that creates the
    conntrack is dropped by ruleset.
    b. in case nat is applied, ct entry would need to be moved in
    the table, which requires another spinlock pair to be taken.
    c. breaks the 'unconfirmed entry is private to cpu' assumption:
    we would need to guard all nfct->ext allocation requests with
    ct->lock spinlock.

    2. Make the unconfirmed list a hash table instead of a pcpu list.
    Shares drawback c) of the first alternative.

    3. Document this is expected and force users to rearrange their
    ruleset (e.g. by using "-m cluster" instead of "-m statistics").
    nft has the 'jhash' expression which can be used instead of 'numgen'.

    Major drawback: doesn't fix what I consider a bug, not very realistic
    and I believe its reasonable to have the existing rulesets to 'just
    work'.

    4. Document this is expected and force users to steer problematic
    packets to the same CPU -- this would serialize the "allocate new
    conntrack entry/nat table evaluation/perform nat/confirm entry", so
    no race can occur. Similar drawback to 3.

    Another advantage of this patch compared to 1) and 2) is that there are
    no changes to the hot path; things are handled in the udp tracker and
    the clash resolution path.

    Cc: rcu@vger.kernel.org
    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

18 Jan, 2019

3 commits


21 Dec, 2018

2 commits

  • We have no explicit signal when a UDP stream has terminated, peers just
    stop sending.

    For suspected stream connections a timeout of two minutes is sane to keep
    NAT mapping alive a while longer.

    It matches tcp conntracks 'timewait' default timeout value.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Currently DNS resolvers that send both A and AAAA queries from same source port
    can trigger stream mode prematurely, which results in non-early-evictable conntrack entry
    for three minutes, even though DNS requests are done in a few milliseconds.

    Add a two second grace period where we continue to use the ordinary
    30-second default timeout. Its enough for DNS request/response traffic,
    even if two request/reply packets are involved.

    ASSURED is still set, else conntrack (and thus a possible
    NAT mapping ...) gets zapped too in case conntrack table runs full.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

03 Nov, 2018

1 commit


21 Sep, 2018

3 commits

  • l4 protocols are demuxed by l3num, l4num pair.

    However, almost all l4 trackers are l3 agnostic.

    Only exceptions are:
    - gre, icmp (ipv4 only)
    - icmpv6 (ipv6 only)

    This commit gets rid of the l3 mapping, l4 trackers can now be looked up
    by their IPPROTO_XXX value alone, which gets rid of the additional l3
    indirection.

    For icmp, ipcmp6 and gre, add a check on state->pf and
    return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
    this seems more fitting than using the generic tracker.

    Additionally we can kill the 2nd l4proto definitions that were needed
    for v4/v6 split -- they are now the same so we can use single l4proto
    struct for each protocol, rather than two.

    The EXPORT_SYMBOLs can be removed as all these object files are
    part of nf_conntrack with no external references.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Its unused, next patch will remove l4proto->l3proto number to simplify
    l4 protocol demuxer lookup.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Only two protocols need the ->error() function: icmp and icmpv6.
    This is because icmp error mssages might be RELATED to an existing
    connection (e.g. PMTUD, port unreachable and the like), and their
    ->error() handlers do this.

    The error callback is already optional, so remove it for
    udp and call them from ->packet() instead.

    As the error() callback can call checksum functions that write to
    skb->csum*, the const qualifier has to be removed as well.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

20 Sep, 2018

2 commits

  • ->new() gets invoked after ->error() and before ->packet() if
    a conntrack lookup has found no result for the tuple.

    We can fold it into ->packet() -- the packet() implementations
    can check if the conntrack is confirmed (new) or not
    (already in hash).

    If its unconfirmed, the conntrack isn't in the hash yet so current
    skb created a new conntrack entry.

    Only relevant side effect -- if packet() doesn't return NF_ACCEPT
    but -NF_ACCEPT (or drop), while the conntrack was just created,
    then the newly allocated conntrack is freed right away, rather than not
    created in the first place.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • nf_hook_state contains all the hook meta-information: netns, protocol family,
    hook location, and so on.

    Instead of only passing selected information, pass a pointer to entire
    structure.

    This will allow to merge the error and the packet handlers and remove
    the ->new() function in followup patches.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

11 Sep, 2018

1 commit

  • Now that cttimeout support for nft_ct is in place, these should depend
    on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
    policy if this option is not enabled.

    [ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    [...]
    [ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
    [...]
    [ 71.600188] Call Trace:
    [ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

24 Aug, 2018

1 commit


16 Jul, 2018

3 commits


09 Jan, 2018

2 commits


25 Oct, 2017

2 commits


04 Sep, 2017

1 commit


25 Aug, 2017

2 commits


02 Feb, 2017

1 commit


03 Jan, 2017

1 commit

  • udplite was copied from udp, they are virtually 100% identical.

    This adds udplite tracker to udp instead, removes udplite module,
    and then makes the udplite tracker builtin.

    udplite will then simply re-use udp timeout settings.
    It makes little sense to add separate sysctls, nowadays we have
    fine-grained timeout policy support via the CT target.

    old:
    text data bss dec hex filename
    1633 672 0 2305 901 nf_conntrack_proto_udp.o
    1756 672 0 2428 97c nf_conntrack_proto_udplite.o
    69526 17937 268 87731 156b3 nf_conntrack.ko

    new:
    text data bss dec hex filename
    2442 1184 0 3626 e2a nf_conntrack_proto_udp.o
    68565 17721 268 86554 1521a nf_conntrack.ko

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Aug, 2016

1 commit

  • This backward compatibility has been around for more than ten years,
    since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
    alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
    the conntrack utility got adopted by many people in the user community
    according to what I observed on the netfilter user mailing list.

    So let's get rid of this.

    Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
    not need to be exported as symbol anymore.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

12 Aug, 2016

1 commit


05 May, 2016

1 commit

  • This patch introduces nf_ct_resolve_clash() to resolve race condition on
    conntrack insertions.

    This is particularly a problem for connection-less protocols such as
    UDP, with no initial handshake. Two or more packets may race to insert
    the entry resulting in packet drops.

    Another problematic scenario are packets enqueued to userspace via
    NFQUEUE after the raw table, that make it easier to trigger this
    race.

    To resolve this, the idea is to reset the conntrack entry to the one
    that won race. Packet and bytes counters are also merged.

    The 'insert_failed' stats still accounts for this situation, after
    this patch, the drop counter is bumped whenever we drop packets, so we
    can watch for unresolved clashes.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

19 Sep, 2015

1 commit


06 Nov, 2014

1 commit

  • Since adding a new function to seq_file (seq_has_overflowed())
    there isn't any value for functions called from seq_show to
    return anything. Remove the int returns of the various
    print_tuple/_print_tuple functions.

    Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com

    Cc: Pablo Neira Ayuso
    Cc: Patrick McHardy
    Cc: Jozsef Kadlecsik
    Cc: netfilter-devel@vger.kernel.org
    Cc: coreteam@netfilter.org
    Signed-off-by: Joe Perches
    Signed-off-by: Steven Rostedt

    Joe Perches
     

19 Apr, 2013

1 commit

  • Add copyright statements to all netfilter files which have had significant
    changes done by myself in the past.

    Some notes:

    - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
    Core Team when it got split out of nf_conntrack_core.c. The copyrights
    even state a date which lies six years before it was written. It was
    written in 2005 by Harald and myself.

    - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
    statements. I've added the copyright statement from net/netfilter/core.c,
    where this code originated

    - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
    it to give the wrong impression

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

06 Apr, 2013

1 commit

  • This patch adds netns support to nf_log and it prepares netns
    support for existing loggers. It is composed of four major
    changes.

    1) nf_log_register has been split to two functions: nf_log_register
    and nf_log_set. The new nf_log_register is used to globally
    register the nf_logger and nf_log_set is used for enabling
    pernet support from nf_loggers.

    Per netns is not yet complete after this patch, it comes in
    separate follow up patches.

    2) Add net as a parameter of nf_log_bind_pf. Per netns is not
    yet complete after this patch, it only allows to bind the
    nf_logger to the protocol family from init_net and it skips
    other cases.

    3) Adapt all nf_log_packet callers to pass netns as parameter.
    After this patch, this function only works for init_net.

    4) Make the sysctl net/netfilter/nf_log pernet.

    Signed-off-by: Gao feng
    Signed-off-by: Pablo Neira Ayuso

    Gao feng
     

05 Jul, 2012

1 commit

  • This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
    moving the corresponding protocol part to where it really belongs to.

    To clarify, note that we follow two different approaches to support per-net
    depending if it's built-in or run-time loadable protocol tracker.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Gao feng

    Pablo Neira Ayuso
     

28 Jun, 2012

2 commits


12 Jun, 2012

1 commit

  • This patch fixes the compilation of the TCP and UDP trackers with sysctl
    compilation disabled:

    net/netfilter/nf_conntrack_proto_udp.c: In function ‘udp_init_net_data’:
    net/netfilter/nf_conntrack_proto_udp.c:279:13: error: ‘struct nf_proto_net’ has no member named
    ‘user’
    net/netfilter/nf_conntrack_proto_tcp.c:1606:9: error: ‘struct nf_proto_net’ has no member named
    ‘user’
    net/netfilter/nf_conntrack_proto_tcp.c:1643:9: error: ‘struct nf_proto_net’ has no member named
    ‘user’

    Reported-by: Fengguang Wu
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

07 Jun, 2012

1 commit