18 Jan, 2019

1 commit


03 Nov, 2018

1 commit


21 Sep, 2018

4 commits

  • l4 protocols are demuxed by l3num, l4num pair.

    However, almost all l4 trackers are l3 agnostic.

    Only exceptions are:
    - gre, icmp (ipv4 only)
    - icmpv6 (ipv6 only)

    This commit gets rid of the l3 mapping, l4 trackers can now be looked up
    by their IPPROTO_XXX value alone, which gets rid of the additional l3
    indirection.

    For icmp, ipcmp6 and gre, add a check on state->pf and
    return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
    this seems more fitting than using the generic tracker.

    Additionally we can kill the 2nd l4proto definitions that were needed
    for v4/v6 split -- they are now the same so we can use single l4proto
    struct for each protocol, rather than two.

    The EXPORT_SYMBOLs can be removed as all these object files are
    part of nf_conntrack with no external references.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Its unused, next patch will remove l4proto->l3proto number to simplify
    l4 protocol demuxer lookup.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The error() handler gets called before allocating or looking up a
    connection tracking entry.

    We can instead use direct calls from the ->packet() handlers which get
    invoked for every packet anyway.

    Only exceptions are icmp and icmpv6, these two special cases will be
    handled in the next patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Only two protocols need the ->error() function: icmp and icmpv6.
    This is because icmp error mssages might be RELATED to an existing
    connection (e.g. PMTUD, port unreachable and the like), and their
    ->error() handlers do this.

    The error callback is already optional, so remove it for
    udp and call them from ->packet() instead.

    As the error() callback can call checksum functions that write to
    skb->csum*, the const qualifier has to be removed as well.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

20 Sep, 2018

2 commits

  • ->new() gets invoked after ->error() and before ->packet() if
    a conntrack lookup has found no result for the tuple.

    We can fold it into ->packet() -- the packet() implementations
    can check if the conntrack is confirmed (new) or not
    (already in hash).

    If its unconfirmed, the conntrack isn't in the hash yet so current
    skb created a new conntrack entry.

    Only relevant side effect -- if packet() doesn't return NF_ACCEPT
    but -NF_ACCEPT (or drop), while the conntrack was just created,
    then the newly allocated conntrack is freed right away, rather than not
    created in the first place.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • nf_hook_state contains all the hook meta-information: netns, protocol family,
    hook location, and so on.

    Instead of only passing selected information, pass a pointer to entire
    structure.

    This will allow to merge the error and the packet handlers and remove
    the ->new() function in followup patches.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

11 Sep, 2018

1 commit

  • Now that cttimeout support for nft_ct is in place, these should depend
    on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
    policy if this option is not enabled.

    [ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    [...]
    [ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
    [...]
    [ 71.600188] Call Trace:
    [ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

29 Aug, 2018

1 commit

  • tcp, sctp and dccp trackers re-use the userspace ctnetlink states
    to index their timeout arrays, which means timeout[0] is never
    used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well
    so external users can simply read it off timeouts[0] without need to
    differentiate dccp/sctp/tcp and udp/icmp/gre/generic.

    The alternative is to map all array accesses to 'i - 1', but that
    is a much more intrusive change.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

16 Jul, 2018

3 commits


09 Jan, 2018

2 commits


08 Jan, 2018

1 commit


25 Oct, 2017

2 commits


04 Sep, 2017

1 commit


25 Aug, 2017

3 commits


06 Jul, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains two Netfilter fixes for your net tree,
    they are:

    1) Fix memleak from netns release path of conntrack protocol trackers,
    patch from Liping Zhang.

    2) Uninitialized flags field in ebt_log, that results in unpredictable
    logging format in ebtables, also from Liping.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Jul, 2017

3 commits

  • This patch is to remove the typedef sctp_inithdr_t, and replace
    with struct sctp_inithdr in the places where it's using this
    typedef.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to remove the typedef sctp_chunkhdr_t, and replace
    with struct sctp_chunkhdr in the places where it's using this
    typedef.

    It is also to fix some indents and use sizeof(variable) instead
    of sizeof(type)., especially in sctp_new.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to remove the typedef sctp_sctphdr_t, and replace
    with struct sctphdr in the places where it's using this typedef.

    It is also to fix some indents and use sizeof(variable) instead
    of sizeof(type).

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

30 Jun, 2017

1 commit

  • After running the following commands for a while, kmemleak reported that
    "1879 new suspected memory leaks" happened:
    # while : ; do
    ip netns add test
    ip netns delete test
    done

    unreferenced object 0xffff88006342fa38 (size 1024):
    comm "ip", pid 15477, jiffies 4295982857 (age 957.836s)
    hex dump (first 32 bytes):
    b8 b0 4d a0 ff ff ff ff c0 34 c3 59 00 88 ff ff ..M......4.Y....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] __kmalloc_track_caller+0x150/0x300
    [] kmemdup+0x20/0x50
    [] dccp_init_net+0x8a/0x160 [nf_conntrack]
    [] nf_ct_l4proto_pernet_register_one+0x25/0x90
    ...
    unreferenced object 0xffff88006342da58 (size 1024):
    comm "ip", pid 15477, jiffies 4295982857 (age 957.836s)
    hex dump (first 32 bytes):
    10 b3 4d a0 ff ff ff ff 04 35 c3 59 00 88 ff ff ..M......5.Y....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] __kmalloc_track_caller+0x150/0x300
    [] kmemdup+0x20/0x50
    [] sctp_init_net+0x5d/0x130 [nf_conntrack]
    [] nf_ct_l4proto_pernet_register_one+0x25/0x90
    ...

    This is because we forgot to implement the get_net_proto for sctp and
    dccp, so we won't invoke the nf_ct_unregister_sysctl to free the
    ctl_table when do netns cleanup. Also note, we will fail to register
    the sysctl for dccp/sctp either due to the lack of get_net_proto.

    Fixes: c51d39010a1b ("netfilter: conntrack: built-in support for DCCP")
    Fixes: a85406afeb3e ("netfilter: conntrack: built-in support for SCTP")
    Cc: Davide Caratti
    Signed-off-by: Liping Zhang
    Acked-by: Davide Caratti
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

24 May, 2017

1 commit

  • sctp_compute_cksum() implementation assumes that at least the SCTP header
    is in the linear part of skb: modify conntrack error callback to avoid
    false CRC32c mismatch, if the transport header is partially/entirely paged.

    Fixes: cf6e007eef83 ("netfilter: conntrack: validate SCTP crc32c in PREROUTING")
    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     

01 May, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. A large bunch of code cleanups, simplify the conntrack extension
    codebase, get rid of the fake conntrack object, speed up netns by
    selective synchronize_net() calls. More specifically, they are:

    1) Check for ct->status bit instead of using nfct_nat() from IPVS and
    Netfilter codebase, patch from Florian Westphal.

    2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

    3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

    4) Introduce nft_is_base_chain() helper function.

    5) Enforce expectation limit from userspace conntrack helper,
    from Gao Feng.

    6) Add nf_ct_remove_expect() helper function, from Gao Feng.

    7) NAT mangle helper function return boolean, from Gao Feng.

    8) ctnetlink_alloc_expect() should only work for conntrack with
    helpers, from Gao Feng.

    9) Add nfnl_msg_type() helper function to nfnetlink to build the
    netlink message type.

    10) Get rid of unnecessary cast on void, from simran singhal.

    11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

    12) Use list_prev_entry() from nf_tables, from simran signhal.

    13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

    14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

    15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

    16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

    17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

    18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

    19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

    20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

    21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

    22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

    23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

    24) Add event mask support to nft_ct, also from Florian.

    25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

    26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

    27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

    28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

    29) Shrink size of nf_conntrack_ecache structure, from Florian.

    30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

    31) Register SYNPROXY hooks on demand, from Florian Westphal.

    32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

    33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

    34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

    35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

    36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

    37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

    38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

    39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

    40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

    41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

    42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Apr, 2017

1 commit

  • If insertion of a new conntrack fails because the table is full, the kernel
    searches the next buckets of the hash slot where the new connection
    was supposed to be inserted at for an entry that hasn't seen traffic
    in reply direction (non-assured), if it finds one, that entry is
    is dropped and the new connection entry is allocated.

    Allow the conntrack gc worker to also remove *assured* conntracks if
    resources are low.

    Do this by querying the l4 tracker, e.g. tcp connections are now dropped
    if they are no longer established (e.g. in finwait).

    This could be refined further, e.g. by adding 'soft' established timeout
    (i.e., a timeout that is only used once we get close to resource
    exhaustion).

    Cc: Jozsef Kadlecsik
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

14 Apr, 2017

1 commit


02 Feb, 2017

1 commit


05 Jan, 2017

1 commit

  • implement sctp_error to let nf_conntrack_in validate crc32c on the packet
    transport header. Assign skb->ip_summed to CHECKSUM_UNNECESSARY and return
    NF_ACCEPT in case of successful validation; otherwise, return -NF_ACCEPT to
    let netfilter skip connection tracking, like other protocols do.

    Besides preventing corrupted packets from matching conntrack entries, this
    fixes functionality of REJECT target: it was not generating any ICMP upon
    reception of SCTP packets, because it was computing RFC 1624 checksum on
    the packet and systematically mismatching crc32c in the SCTP header.

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     

05 Dec, 2016

1 commit

  • CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection
    tracking support for SCTP protocol is built-in into nf_conntrack.ko.

    footprint test:
    $ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \
    net/ipv4/netfilter/nf_conntrack_ipv4.ko \
    net/ipv6/netfilter/nf_conntrack_ipv6.ko

    (builtin)|| sctp | ipv4 | ipv6 | nf_conntrack
    ---------++--------+--------+--------+--------------
    none || 498243 | 828755 | 828676 | 6141434
    SCTP || - | 829254 | 829175 | 6547872

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     

18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

10 Nov, 2016

1 commit

  • modify registration and deregistration of layer-4 protocol trackers to
    facilitate inclusion of new elements into the current list of builtin
    protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP,
    UDPlite) layer-4 protocol trackers usually register/deregister themselves
    using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...).
    This sequence is interrupted and rolled back in case of error; in order to
    simplify addition of builtin protocols, the input of the above functions
    has been modified to allow registering/unregistering multiple protocols.

    Signed-off-by: Davide Caratti
    Signed-off-by: Pablo Neira Ayuso

    Davide Caratti
     

13 Aug, 2016

1 commit

  • This backward compatibility has been around for more than ten years,
    since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
    alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
    the conntrack utility got adopted by many people in the user community
    according to what I observed on the netfilter user mailing list.

    So let's get rid of this.

    Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
    not need to be exported as symbol anymore.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

12 Aug, 2016

1 commit


20 Apr, 2016

1 commit


19 Sep, 2015

1 commit