26 Apr, 2017

10 commits

  • make sure nat extension gets added if the master conntrack is subject to
    NAT. This will be required once the nat core stops adding it by default.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Currently the nat extension is always attached as soon as nat module is
    loaded. However, most NAT uses do not need the nat extension anymore.

    Prepare to remove the add-nat-by-default by making those places that need
    it attach it if its not present yet.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • krealloc(NULL, ..) is same as kmalloc(), so we can avoid special-casing
    the initial allocation after the prealloc removal (we had to use
    ->alloc_len as the initial allocation size).

    This also means we do not zero the preallocated memory anymore; only
    offsets[]. Existing code makes sure the new (used) extension space gets
    zeroed out.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • It was used by the nat extension, but since commit
    7c9664351980 ("netfilter: move nat hlist_head to nf_conn") its only needed
    for connections that use MASQUERADE target or a nat helper.

    Also it seems a lot easier to preallocate a fixed size instead.

    With default settings, conntrack first adds ecache extension (sysctl
    defaults to 1), so we get 40(ct extension header) + 24 (ecache) == 64 byte
    on x86_64 for initial allocation.

    Followup patches can constify the extension structs and avoid
    the initial zeroing of the entire extension area.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Current SYNPROXY codes return NF_DROP during normal TCP handshaking,
    it is not friendly to caller. Because the nf_hook_slow would treat
    the NF_DROP as an error, and return -EPERM.
    As a result, it may cause the top caller think it meets one error.

    For example, the following codes are from cfv_rx_poll()
    err = netif_receive_skb(skb);
    if (unlikely(err)) {
    ++cfv->ndev->stats.rx_dropped;
    } else {
    ++cfv->ndev->stats.rx_packets;
    cfv->ndev->stats.rx_bytes += skb_len;
    }
    When SYNPROXY returns NF_DROP, then netif_receive_skb returns -EPERM.
    As a result, the cfv driver would treat it as an error, and increase
    the rx_dropped counter.

    So use NF_STOLEN instead of NF_DROP now because there is no error
    happened indeed, and free the skb directly.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • Similar to ip_register_table, pass nf_hook_ops to ebt_register_table().
    This allows to handle hook registration also via pernet_ops and allows
    us to avoid use of legacy register_hook api.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • looks like decnet isn't namespacified in first place, so restrict hook
    registration to the initial namespace.

    Prepares for eventual removal of legacy nf_register_hook() api.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • nf_(un)register_hooks has to maintain an internal hook list to add/remove
    those hooks from net namespaces as they are added/deleted.

    ipvs already uses pernet_ops, so we can switch to the (more recent)
    pernet hook api instead.

    Compile tested only.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Defer registration of the synproxy hooks until the first SYNPROXY rule is
    added. Also means we only register hooks in namespaces that need it.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

19 Apr, 2017

11 commits

  • The window scale may be enlarged from 14 to 15 according to the itef
    draft https://tools.ietf.org/html/draft-nishida-tcpm-maxwin-03.

    Use the macro TCP_MAX_WSCALE to support it easily with TCP stack in
    the future.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • The commit ab8bc7ed864b9c4f1fcb00a22bbe4e0f66ce8003
    ("netfilter: remove nf_ct_is_untracked")
    changed the line
    if (ct && !nf_ct_is_untracked(ct) && nfct_nat(ct)) {
    to
    if (ct && nfct_nat(ct)) {

    meanwhile, the commit 41390895e50bc4f28abe384c6b35ac27464a20ec
    ("netfilter: ipvs: don't check for presence of nat extension")
    from ipvs-next had changed the same line to

    if (ct && !nf_ct_is_untracked(ct) && (ct->status & IPS_NAT_MASK)) {

    When ipvs-next got merged into nf-next, the merge resolution took
    the first version, dropping the conversion of nfct_nat().

    While this doesn't cause a problem at the moment, it will once we stop
    adding the nat extension by default.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Only "cache" needs to use ulong (its used with set_bit()), missed can use
    u16. Also add build-time assertion to ensure event bits fit.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • If insertion of a new conntrack fails because the table is full, the kernel
    searches the next buckets of the hash slot where the new connection
    was supposed to be inserted at for an entry that hasn't seen traffic
    in reply direction (non-assured), if it finds one, that entry is
    is dropped and the new connection entry is allocated.

    Allow the conntrack gc worker to also remove *assured* conntracks if
    resources are low.

    Do this by querying the l4 tracker, e.g. tcp connections are now dropped
    if they are no longer established (e.g. in finwait).

    This could be refined further, e.g. by adding 'soft' established timeout
    (i.e., a timeout that is only used once we get close to resource
    exhaustion).

    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • commit 223b02d923ecd7c84cf9780bb3686f455d279279
    ("netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len")
    had to increase size of the extension offsets because total size of the
    extensions had increased to a point where u8 did overflow.

    3 years later we've managed to diet extensions a bit and we no longer
    need u16. Furthermore we can now add a compile-time assertion for this
    problem.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • get rid of the (now unused) nf_ct_ext_add_length define and also
    rename the function to plain nf_ct_ext_add().

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • No need to track this for inkernel helpers anymore as
    NF_CT_HELPER_BUILD_BUG_ON checks do this now.

    All inkernel helpers know what kind of structure they
    stored in helper->data.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Userspace should not abuse the kernel to store large amounts of data,
    reject requests larger than the private area can accommodate.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • add a 32 byte scratch area in the helper struct instead of relying
    on variable sized helpers plus compile-time asserts to let us know
    if 32 bytes aren't enough anymore.

    Not having variable sized helpers will later allow to add BUILD_BUG_ON
    for the total size of conntrack extensions -- the helper extension is
    the only one that doesn't have a fixed size.

    The (useless!) NF_CT_HELPER_BUILD_BUG_ON(0); are added so that in case
    someone adds a new helper and copy-pastes from one that doesn't store
    private data at least some indication that this macro should be used
    somehow is there...

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • its definition is not needed in nf_conntrack.h.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • By default the kernel emits all ctnetlink events for a connection.
    This allows to select the types of events to generate.

    This can be used to e.g. only send DESTROY events but no NEW/UPDATE ones
    and will work even if sysctl net.netfilter.nf_conntrack_events is set to 0.

    This was already possible via iptables' CT target, but the nft version has
    the advantage that it can also be used with already-established conntracks.

    The added nf_ct_is_template() check isn't a bug fix as we only support
    mark and labels (and unlike ecache the conntrack core doesn't copy those).

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

15 Apr, 2017

6 commits

  • This function is now obsolete and always returns false.
    This change has no effect on generated code.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • resurrect an old patch from Pablo Neira to remove the untracked objects.

    Currently, there are four possible states of an skb wrt. conntrack.

    1. No conntrack attached, ct is NULL.
    2. Normal (kmem cache allocated) ct attached.
    3. a template (kmalloc'd), not in any hash tables at any point in time
    4. the 'untracked' conntrack, a percpu nf_conn object, tagged via
    IPS_UNTRACKED_BIT in ct->status.

    Untracked is supposed to be identical to case 1. It exists only
    so users can check

    -m conntrack --ctstate UNTRACKED vs.
    -m conntrack --ctstate INVALID

    e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is
    supposed to be a no-op.

    Thus currently we need to check
    ct == NULL || nf_ct_is_untracked(ct)

    in a lot of places in order to avoid altering untracked objects.

    The other consequence of the percpu untracked object is that all
    -j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op
    (inc/dec the untracked conntracks refcount).

    This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to
    make the distinction instead.

    The (few) places that care about packet invalid (ct is NULL) vs.
    packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED,
    but all other places can omit the nf_ct_is_untracked() check.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • 1. Remove single !events condition check to deliver the missed event
    even though there is no new event happened.

    Consider this case:
    1) nf_ct_deliver_cached_events is invoked at the first time, the
    event is failed to deliver, then the missed is set.
    2) nf_ct_deliver_cached_events is invoked again, but there is no
    any new event happened.
    The missed event is lost really.

    It would try to send the missed event again after remove this check.
    And it is ok if there is no missed event because the latter check
    !((events | missed) & e->ctmask) could avoid it.

    2. Correct the return value check of notify->fcn.
    When send the event successfully, it returns 0, not postive value.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • The __nf_nat_alloc_null_binding invokes nf_nat_setup_info which may
    return NF_DROP when memory is exhausted, so convert NF_DROP to -ENOMEM
    to make ctnetlink happy. Or ctnetlink_setup_nat treats it as a success
    when one error NF_DROP happens actully.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • Simon Horman says:

    ====================
    Second Round of IPVS Updates for v4.12

    please consider these clean-ups and enhancements to IPVS for v4.12.

    * Removal unused variable
    * Use kzalloc where appropriate
    * More efficient detection of presence of NAT extension
    ====================

    Signed-off-by: Pablo Neira Ayuso

    Conflicts:
    net/netfilter/ipvs/ip_vs_ftp.c

    Pablo Neira Ayuso
     
  • There are no in-tree callers.

    Signed-off-by: Aaron Conole
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Aaron Conole
     

14 Apr, 2017

3 commits


09 Apr, 2017

2 commits


08 Apr, 2017

1 commit


07 Apr, 2017

7 commits