20 Sep, 2018

1 commit


11 Sep, 2018

1 commit

  • Now that cttimeout support for nft_ct is in place, these should depend
    on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
    policy if this option is not enabled.

    [ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    [...]
    [ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
    [...]
    [ 71.600188] Call Trace:
    [ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

29 Aug, 2018

1 commit

  • tcp, sctp and dccp trackers re-use the userspace ctnetlink states
    to index their timeout arrays, which means timeout[0] is never
    used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well
    so external users can simply read it off timeouts[0] without need to
    differentiate dccp/sctp/tcp and udp/icmp/gre/generic.

    The alternative is to map all array accesses to 'i - 1', but that
    is a much more intrusive change.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

16 Jul, 2018

3 commits


27 Apr, 2018

1 commit

  • Dominique Martinet reported a TCP hang problem when simultaneous open was used.
    The problem is that the tcp_conntracks state table is not smart enough
    to handle the case. The state table could be fixed by introducing a new state,
    but that would require more lines of code compared to this patch, due to the
    required backward compatibility with ctnetlink.

    Signed-off-by: Jozsef Kadlecsik
    Reported-by: Dominique Martinet
    Tested-by: Dominique Martinet
    Signed-off-by: Pablo Neira Ayuso

    Jozsef Kadlecsik
     

09 Jan, 2018

3 commits

  • This new bit tells us that the conntrack entry is owned by the flow
    table offload infrastructure.

    # cat /proc/net/nf_conntrack
    ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2

    Note the [OFFLOAD] tag in the listing.

    The timer of such conntrack entries look like stopped from userspace.
    In practise, to make sure the conntrack entry does not go away, the
    conntrack timer is periodically set to an arbitrary large value that
    gets refreshed on every iteration from the garbage collector, so it
    never expires- and they display no internal state in the case of TCP
    flows. This allows us to save a bitcheck from the packet path via
    nf_ct_is_expired().

    Conntrack entries that have been offloaded to the flow table
    infrastructure cannot be deleted/flushed via ctnetlink. The flow table
    infrastructure is also responsible for releasing this conntrack entry.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Nowadays this is just the default template that is used when setting up
    the net namespace, so nothing writes to these locations.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • previous patches removed all writes to these structs so we can
    now mark them as const.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

08 Jan, 2018

1 commit


20 Nov, 2017

1 commit

  • When zero window is announced we can get into a situation where
    connection stays around forever:

    1. One side announces zero window.
    2. Other side closes.

    In this case, no FIN is sent (stuck in send queue).

    Unless other side opens the window up again conntrack
    stays in ESTABLISHED state for a very long time.

    Lets alleviate this by lowering the timeout to RETRANS (5 minutes),
    the other end should be sending zero window probes to keep the
    connection established as long as a socket still exists.

    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

06 Nov, 2017

1 commit

  • We currently call ->nlattr_tuple_size() once at register time and
    cache result in l4proto->nla_size.

    nla_size is the only member that is written to, avoiding this would
    allow to make l4proto trackers const.

    We can use ->nlattr_tuple_size() at run time, and cache result in
    the individual trackers instead.

    This is an intermediate step, next patch removes nlattr_size()
    callback and computes size at compile time, then removes nla_size.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

25 Oct, 2017

3 commits


04 Sep, 2017

1 commit


25 Aug, 2017

3 commits


01 May, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. A large bunch of code cleanups, simplify the conntrack extension
    codebase, get rid of the fake conntrack object, speed up netns by
    selective synchronize_net() calls. More specifically, they are:

    1) Check for ct->status bit instead of using nfct_nat() from IPVS and
    Netfilter codebase, patch from Florian Westphal.

    2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

    3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

    4) Introduce nft_is_base_chain() helper function.

    5) Enforce expectation limit from userspace conntrack helper,
    from Gao Feng.

    6) Add nf_ct_remove_expect() helper function, from Gao Feng.

    7) NAT mangle helper function return boolean, from Gao Feng.

    8) ctnetlink_alloc_expect() should only work for conntrack with
    helpers, from Gao Feng.

    9) Add nfnl_msg_type() helper function to nfnetlink to build the
    netlink message type.

    10) Get rid of unnecessary cast on void, from simran singhal.

    11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

    12) Use list_prev_entry() from nf_tables, from simran signhal.

    13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

    14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

    15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

    16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

    17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

    18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

    19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

    20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

    21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

    22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

    23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

    24) Add event mask support to nft_ct, also from Florian.

    25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

    26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

    27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

    28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

    29) Shrink size of nf_conntrack_ecache structure, from Florian.

    30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

    31) Register SYNPROXY hooks on demand, from Florian Westphal.

    32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

    33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

    34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

    35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

    36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

    37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

    38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

    39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

    40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

    41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

    42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Apr, 2017

2 commits

  • The window scale may be enlarged from 14 to 15 according to the itef
    draft https://tools.ietf.org/html/draft-nishida-tcpm-maxwin-03.

    Use the macro TCP_MAX_WSCALE to support it easily with TCP stack in
    the future.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • If insertion of a new conntrack fails because the table is full, the kernel
    searches the next buckets of the hash slot where the new connection
    was supposed to be inserted at for an entry that hasn't seen traffic
    in reply direction (non-assured), if it finds one, that entry is
    is dropped and the new connection entry is allocated.

    Allow the conntrack gc worker to also remove *assured* conntracks if
    resources are low.

    Do this by querying the l4 tracker, e.g. tcp connections are now dropped
    if they are no longer established (e.g. in finwait).

    This could be refined further, e.g. by adding 'soft' established timeout
    (i.e., a timeout that is only used once we get close to resource
    exhaustion).

    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

14 Apr, 2017

1 commit


02 Feb, 2017

1 commit


13 Aug, 2016

1 commit

  • This backward compatibility has been around for more than ten years,
    since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
    alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
    the conntrack utility got adopted by many people in the user community
    according to what I observed on the netfilter user mailing list.

    So let's get rid of this.

    Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
    not need to be exported as symbol anymore.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

12 Aug, 2016

1 commit


24 Apr, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree, mostly from Florian Westphal to sort out the lack of sufficient
    validation in x_tables and connlabel preparation patches to add
    nf_tables support. They are:

    1) Ensure we don't go over the ruleset blob boundaries in
    mark_source_chains().

    2) Validate that target jumps land on an existing xt_entry. This extra
    sanitization comes with a performance penalty when loading the ruleset.

    3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.

    4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.

    5) Make sure the minimal possible target size in x_tables.

    6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.

    7) Check that standard target size is valid.

    8) More sanitization to ensure that the target_offset field is correct.

    9) Add xt_check_entry_match() to validate that matches are well-formed.

    10-12) Three patch to reduce the number of parameters in
    translate_compat_table() for {arp,ip,ip6}tables by using a container
    structure.

    13) No need to return value from xt_compat_match_from_user(), so make
    it void.

    14) Consolidate translate_table() so it can be used by compat code too.

    15) Remove obsolete check for compat code, so we keep consistent with
    what was already removed in the native layout code (back in 2007).

    16) Get rid of target jump validation from mark_source_chains(),
    obsoleted by #2.

    17) Introduce xt_copy_counters_from_user() to consolidate counter
    copying, and use it from {arp,ip,ip6}tables.

    18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
    functions.

    19) Move nf_connlabel_match() to xt_connlabel.

    20) Skip event notification if connlabel did not change.

    21) Update of nf_connlabels_get() to make the upcoming nft connlabel
    support easier.

    23) Remove spinlock to read protocol state field in conntrack.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Apr, 2016

1 commit


08 Apr, 2016

1 commit

  • Baozeng Ding reported a KASAN stack out of bounds issue - it uncovered that
    the TCP option parsing routines in netfilter TCP connection tracking could
    read one byte out of the buffer of the TCP options. Therefore in the patch
    we check that the available data length is large enough to parse both TCP
    option code and size.

    Reported-by: Baozeng Ding
    Tested-by: Baozeng Ding
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jozsef Kadlecsik
     

19 Sep, 2015

1 commit


16 May, 2015

1 commit

  • In compliance with RFC5961, the network stack send challenge ACK in
    response to spurious SYN packets, since commit 0c228e833c88 ("tcp:
    Restore RFC5961-compliant behavior for SYN packets").

    This pose a problem for netfilter conntrack in state LAST_ACK, because
    this challenge ACK is (falsely) seen as ACKing last FIN, causing a
    false state transition (into TIME_WAIT).

    The challenge ACK is hard to distinguish from real last ACK. Thus,
    solution introduce a flag that tracks the potential for seeing a
    challenge ACK, in case a SYN packet is let through and current state
    is LAST_ACK.

    When conntrack transition LAST_ACK to TIME_WAIT happens, this flag is
    used for determining if we are expecting a challenge ACK.

    Scapy based reproducer script avail here:
    https://github.com/netoptimizer/network-testing/blob/master/scapy/tcp_hacks_3WHS_LAST_ACK.py

    Fixes: 0c228e833c88 ("tcp: Restore RFC5961-compliant behavior for SYN packets")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jesper Dangaard Brouer
     

09 Dec, 2014

1 commit


06 Nov, 2014

2 commits

  • Since adding a new function to seq_file (seq_has_overflowed())
    there isn't any value for functions called from seq_show to
    return anything. Remove the int returns of the various
    print_tuple/_print_tuple functions.

    Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com

    Cc: Pablo Neira Ayuso
    Cc: Patrick McHardy
    Cc: Jozsef Kadlecsik
    Cc: netfilter-devel@vger.kernel.org
    Cc: coreteam@netfilter.org
    Signed-off-by: Joe Perches
    Signed-off-by: Steven Rostedt

    Joe Perches
     
  • The seq_printf() and friends are having their return values removed.
    The print_conntrack() returns the result of seq_printf(), which is
    meaningless when seq_printf() returns void. Might as well remove the
    return values of print_conntrack() as well.

    Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
    Acked-by: Pablo Neira Ayuso
    Cc: Patrick McHardy
    Cc: Jozsef Kadlecsik
    Cc: netfilter-devel@vger.kernel.org
    Cc: coreteam@netfilter.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

22 Oct, 2014

1 commit

  • When a port that was used to listen for inbound connections gets closed
    and reused for outgoing connections (like rsh ends up doing for stderr
    flow), current we may reject the SYN/ACK packet for the new connection
    because tcp_conntracks states forbirds a port to become a client while
    there is still a TIME_WAIT entry in there for it.

    As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
    for it is 120s, there is a ~60s window that the application can end up
    opening a port that conntrack will end up blocking.

    This patch fixes this by simply allowing such state transition: if we
    see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
    that the rest of the code already handles this situation, more
    specificly in tcp_packet(), first switch clause.

    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Marcelo Leitner
     

28 Aug, 2013

2 commits

  • Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
    core with common functions and an address family specific target.

    The SYNPROXY receives the connection request from the client, responds with
    a SYN/ACK containing a SYN cookie and announcing a zero window and checks
    whether the final ACK from the client contains a valid cookie.

    It then establishes a connection to the original destination and, if
    successful, sends a window update to the client with the window size
    announced by the server.

    Support for timestamps, SACK, window scaling and MSS options can be
    statically configured as target parameters if the features of the server
    are known. If timestamps are used, the timestamp value sent back to
    the client in the SYN/ACK will be different from the real timestamp of
    the server. In order to now break PAWS, the timestamps are translated in
    the direction server->client.

    Signed-off-by: Patrick McHardy
    Tested-by: Martin Topholm
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Split out sequence number adjustments from NAT and move them to the conntrack
    core to make them usable for SYN proxying. The sequence number adjustment
    information is moved to a seperate extend. The extend is added to new
    conntracks when a NAT mapping is set up for a connection using a helper.

    As a side effect, this saves 24 bytes per connection with NAT in the common
    case that a connection does not have a helper assigned.

    Signed-off-by: Patrick McHardy
    Tested-by: Martin Topholm
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

21 Aug, 2013

1 commit

  • Conflicts:
    net/netfilter/nf_conntrack_proto_tcp.c

    The conflict had to do with overlapping changes dealing with
    fixing the use of an "s32" to hold the value returned by
    NAT_OFFSET().

    Pablo Neira Ayuso says:

    ====================
    The following batch contains Netfilter/IPVS updates for your net-next tree.
    More specifically, they are:

    * Trivial typo fix in xt_addrtype, from Phil Oester.

    * Remove net_ratelimit in the conntrack logging for consistency with other
    logging subsystem, from Patrick McHardy.

    * Remove unneeded includes from the recently added xt_connlabel support, from
    Florian Westphal.

    * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for
    this, from Florian Westphal.

    * Remove tproxy core, now that we have socket early demux, from Florian
    Westphal.

    * A couple of patches to refactor conntrack event reporting to save a good
    bunch of lines, from Florian Westphal.

    * Fix missing locking in NAT sequence adjustment, it did not manifested in
    any known bug so far, from Patrick McHardy.

    * Change sequence number adjustment variable to 32 bits, to delay the
    possible early overflow in long standing connections, also from Patrick.

    * Comestic cleanups for IPVS, from Dragos Foianu.

    * Fix possible null dereference in IPVS in the SH scheduler, from Daniel
    Borkmann.

    * Allow to attach conntrack expectations via nfqueue. Before this patch, you
    had to use ctnetlink instead, thus, we save the conntrack lookup.

    * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Aug, 2013

1 commit

  • Currently the conntrack checks if the ending sequence of a packet
    falls within the observed receive window. However it does so even
    if it has not observe any packet from the remote yet and uses an
    uninitialized receive window (td_maxwin).

    If a connection uses Fast Open to send a SYN-data packet which is
    dropped afterward in the network. The subsequent SYNs retransmits
    will all fail this check and be discarded, leading to a connection
    timeout. This is because the SYN retransmit does not contain data
    payload so

    end == initial sequence number (isn) + 1
    sender->td_end == isn + syn_data_len
    receiver->td_maxwin == 0

    The fix is to only apply this check after td_maxwin is initialized.

    Reported-by: Michael Chan
    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Yuchung Cheng