18 Mar, 2020

1 commit

  • commit dc15af8e9dbd039ebb06336597d2c491ef46ab74 upstream.

    If .next function does not change position index,
    following .show function will repeat output related
    to current position index.

    Cc: stable@vger.kernel.org
    Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     

13 Sep, 2019

1 commit


03 Sep, 2019

1 commit


27 Aug, 2019

1 commit

  • When I merged the extension sysctl tables with the main one I forgot to
    reset them on netns creation. They currently read/write init_net settings.

    Fixes: d912dec12428 ("netfilter: conntrack: merge acct and helper sysctl table with main one")
    Fixes: cb2833ed0044 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one")
    Reported-by: Shmulik Ladkani
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

04 Aug, 2019

1 commit

  • Use shared sysctl variables for zero and one constants, as in commit
    eec4844fae7c ("proc/sysctl: add shared variables for range check")

    Fixes: 8f14c99c7eda ("netfilter: conntrack: limit sysctl setting for boolean options")
    Signed-off-by: Matteo Croce
    Signed-off-by: Pablo Neira Ayuso

    Matteo Croce
     

30 Apr, 2019

1 commit


28 Jan, 2019

1 commit

  • When nf_ct_netns_get() fails, it should clean up itself,
    its caller doesn't need to call nf_conntrack_fini_net().

    nf_conntrack_init_net() is called after registering sysctl
    and proc, so its cleanup function should be called before
    unregistering sysctl and proc.

    Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
    Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling")
    Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com
    Signed-off-by: Cong Wang
    Signed-off-by: Pablo Neira Ayuso

    Cong Wang
     

18 Jan, 2019

3 commits

  • The connection tracking hooks can be optionally registered per netns
    when conntrack is specifically invoked from the ruleset since
    0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed
    by ruleset"). Then, since 4d3a57f23dec ("netfilter: conntrack: do not
    enable connection tracking unless needed"), the default behaviour is
    changed to always register them on demand.

    This patch provides a toggle that allows users to always register them.
    Without this toggle, in order to use conntrack for statistics
    collection, you need a dummy rule that refers to conntrack, eg.

    iptables -I INPUT -m state --state NEW

    This patch allows users to restore the original behaviour via modparam,
    ie. always register connection tracking, eg.

    modprobe nf_conntrack enable_hooks=1

    Hence, no dummy rule is required.

    Reported-by: Laura Garcia
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Its now same as __nf_ct_l4proto_find(), so rename that to
    nf_ct_l4proto_find and use it everywhere.

    It never returns NULL and doesn't need locks or reference counts.

    Before this series:
    302824 net/netfilter/nf_conntrack.ko
    21504 net/netfilter/nf_conntrack_proto_gre.ko

    text data bss dec hex filename
    6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko
    108356 20613 236 129205 1f8b5 nf_conntrack.ko

    After:
    294864 net/netfilter/nf_conntrack.ko
    text data bss dec hex filename
    106979 19557 240 126776 1ef38 nf_conntrack.ko

    so, even with builtin gre, total size got reduced.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Due to historical reasons, all l4 trackers register their own
    sysctls.

    This leads to copy&pasted boilerplate code, that does exactly same
    thing, just with different data structure.

    Place all of this in a single file.

    This allows to remove the various ctl_table pointers from the ct_netns
    structure and reduces overall code size.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

21 Dec, 2018

4 commits


21 Sep, 2018

1 commit

  • l4 protocols are demuxed by l3num, l4num pair.

    However, almost all l4 trackers are l3 agnostic.

    Only exceptions are:
    - gre, icmp (ipv4 only)
    - icmpv6 (ipv6 only)

    This commit gets rid of the l3 mapping, l4 trackers can now be looked up
    by their IPPROTO_XXX value alone, which gets rid of the additional l3
    indirection.

    For icmp, ipcmp6 and gre, add a check on state->pf and
    return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
    this seems more fitting than using the generic tracker.

    Additionally we can kill the 2nd l4proto definitions that were needed
    for v4/v6 split -- they are now the same so we can use single l4proto
    struct for each protocol, rather than two.

    The EXPORT_SYMBOLs can be removed as all these object files are
    part of nf_conntrack with no external references.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Sep, 2018

1 commit


17 Jul, 2018

1 commit

  • This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
    abstraction.

    This gets rid of all l3proto indirect calls and the need to do
    a lookup on the function to call for l3 demux.

    It increases module size by only a small amount (12kbyte), so this reduces
    size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
    or nf_conntrack_ipv6 module.

    before:
    text data bss dec hex filename
    7357 1088 0 8445 20fd nf_conntrack_ipv4.ko
    7405 1084 4 8493 212d nf_conntrack_ipv6.ko
    72614 13689 236 86539 1520b nf_conntrack.ko
    19K nf_conntrack_ipv4.ko
    19K nf_conntrack_ipv6.ko
    179K nf_conntrack.ko

    after:
    text data bss dec hex filename
    79277 13937 236 93450 16d0a nf_conntrack.ko
    191K nf_conntrack.ko

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

16 Jul, 2018

1 commit

  • handle everything from ctnetlink directly.

    After all these years we still only support ipv4 and ipv6, so it
    seems reasonable to remove l3 protocol tracker support and instead
    handle ipv4/ipv6 from a common, always builtin inet tracker.

    Step 1: Get rid of all the l3proto->func() calls.

    Start with ctnetlink, then move on to packet-path ones.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

16 May, 2018

1 commit

  • Variants of proc_create{,_data} that directly take a struct seq_operations
    and deal with network namespaces in ->open and ->release. All callers of
    proc_create + seq_open_net converted over, and seq_{open,release}_net are
    removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

28 Mar, 2018

1 commit


27 Mar, 2018

1 commit

  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

05 Mar, 2018

1 commit

  • These pernet_operations register and unregister sysctl and /proc
    entries. Exit batch method also waits till all per-net conntracks
    are dead. Thus, they are safe to be marked as async.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

19 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    # ipvs
    Acked-by: Julian Anastasov
    Signed-off-by: Alexey Dobriyan
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Alexey Dobriyan
     

09 Jan, 2018

1 commit

  • This new bit tells us that the conntrack entry is owned by the flow
    table offload infrastructure.

    # cat /proc/net/nf_conntrack
    ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2

    Note the [OFFLOAD] tag in the listing.

    The timer of such conntrack entries look like stopped from userspace.
    In practise, to make sure the conntrack entry does not go away, the
    conntrack timer is periodically set to an arbitrary large value that
    gets refreshed on every iteration from the garbage collector, so it
    never expires- and they display no internal state in the case of TCP
    flows. This allows us to save a bitcheck from the packet path via
    nf_ct_is_expired().

    Conntrack entries that have been offloaded to the flow table
    infrastructure cannot be deleted/flushed via ctnetlink. The flow table
    infrastructure is also responsible for releasing this conntrack entry.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

04 Sep, 2017

1 commit


25 Aug, 2017

3 commits


01 Aug, 2017

1 commit

  • Discussion during NFWS 2017 in Faro has shown that the current
    conntrack behaviour is unreasonable.

    Even if conntrack module is loaded on behalf of a single net namespace,
    its turned on for all namespaces, which is expensive. Commit
    481fa373476 ("netfilter: conntrack: add nf_conntrack_default_on sysctl")
    attempted to provide an alternative to the 'default on' behaviour by
    adding a sysctl to change it.

    However, as Eric points out, the sysctl only becomes available
    once the module is loaded, and then its too late.

    So we either have to move the sysctl to the core, or, alternatively,
    change conntrack to become active only once the rule set requires this.

    This does the latter, conntrack is only enabled when a rule needs it.

    Reported-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

07 Apr, 2017

1 commit


02 Feb, 2017

1 commit

  • After this change conntrack operations (lookup, creation, matching from
    ruleset) only access one instead of two sk_buff cache lines.

    This works for normal conntracks because those are allocated from a slab
    that guarantees hw cacheline or 8byte alignment (whatever is larger)
    so the 3 bits needed for ctinfo won't overlap with nf_conn addresses.

    Template allocation now does manual address alignment (see previous change)
    on arches that don't have sufficent kmalloc min alignment.

    Some spots intentionally use skb->_nfct instead of skb_nfct() helpers,
    this is to avoid undoing the skb_nfct() use when we remove untracked
    conntrack object in the future.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

05 Dec, 2016

1 commit

  • This switch (default on) can be used to disable automatic registration
    of connection tracking functionality in newly created network
    namespaces.

    This means that when net namespace goes down (or the tracker protocol
    module is unloaded) we *might* have to unregister the hooks.

    We can either add another per-netns variable that tells if
    the hooks got registered by default, or, alternatively, just call
    the protocol _put() function and have the callee deal with a possible
    'extra' put() operation that doesn't pair with a get() one.

    This uses the latter approach, i.e. a put() without a get has no effect.

    Conntrack is still enabled automatically regardless of the new sysctl
    setting if the new net namespace requires connection tracking, e.g. when
    NAT rules are created.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

25 Sep, 2016

1 commit

  • Fabian reports a possible conntrack memory leak (could not reproduce so
    far), however, one minor issue can be easily resolved:

    > cat /proc/net/nf_conntrack | wc -l = 5
    > 4 minutes required to clean up the table.

    We should not report those timed-out entries to the user in first place.
    And instead of just skipping those timed-out entries while iterating over
    the table we can also zap them (we already do this during ctnetlink
    walks, but I forgot about the /proc interface).

    Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")
    Reported-by: Fabian Frederick
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Sep, 2016

1 commit

  • These counters sit in hot path and do show up in perf, this is especially
    true for 'found' and 'searched' which get incremented for every packet
    processed.

    Information like

    searched=212030105
    new=623431
    found=333613
    delete=623327

    does not seem too helpful nowadays:

    - on busy systems found and searched will overflow every few hours
    (these are 32bit integers), other more busy ones every few days.

    - for debugging there are better methods, such as iptables' trace target,
    the conntrack log sysctls. Nowadays we also have perf tool.

    This removes packet path stat counters except those that
    are expected to be 0 (or close to 0) on a normal system, e.g.
    'insert_failed' (race happened) or 'invalid' (proto tracker rejects).

    The insert stat is retained for the ctnetlink case.
    The found stat is retained for the tuple-is-taken check when NAT has to
    determine if it needs to pick a different source address.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

07 Sep, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. Most relevant updates are the removal of per-conntrack timers to
    use a workqueue/garbage collection approach instead from Florian
    Westphal, the hash and numgen expression for nf_tables from Laura
    Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
    removal of ip_conntrack sysctl and many other incremental updates on our
    Netfilter codebase.

    More specifically, they are:

    1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
    transport area in dccp, sctp, tcp, udp and udplite protocol
    conntrackers, from Gao Feng.

    2) Missing whitespace on error message in physdev match, from Hangbin Liu.

    3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

    4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

    5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
    from Florian.

    6) Rename nf_tables set implementation to nft_set_{name}.c

    7) Introduce the hash expression to allow arbitrary hashing of selector
    concatenations, from Laura Garcia Liebana.

    8) Remove ip_conntrack sysctl backward compatibility code, this code has
    been around for long time already, and we have two interfaces to do
    this already: nf_conntrack sysctl and ctnetlink.

    9) Use nf_conntrack_get_ht() helper function whenever possible, instead
    of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

    10) Add quota expression for nf_tables.

    11) Add number generator expression for nf_tables, this supports
    incremental and random generators that can be combined with maps,
    very useful for load balancing purpose, again from Laura Garcia Liebana.

    12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

    13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
    configuration, this is used by a follow up patch to perform better chain
    update validation.

    14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
    nft_set_hash implementation to honor the NLM_F_EXCL flag.

    15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
    patch from Florian Westphal.

    16) Don't use the DYING bit to know if the conntrack event has been already
    delivered, instead a state variable to track event re-delivery
    states, also from Florian.

    17) Remove the per-conntrack timer, use the workqueue approach that was
    discussed during the NFWS, from Florian Westphal.

    18) Use the netlink conntrack table dump path to kill stale entries,
    again from Florian.

    19) Add a garbage collector to get rid of stale conntracks, from
    Florian.

    20) Reschedule garbage collector if eviction rate is high.

    21) Get rid of the __nf_ct_kill_acct() helper.

    22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

    23) Make nf_log_set() interface assertive on unsupported families.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Aug, 2016

1 commit


12 Aug, 2016

1 commit


11 Jul, 2016

1 commit

  • When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
    hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
    happen, because reader can observe a newly allocated hash but the old size
    (or vice versa). So oops will happen like follows:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
    IP: [] seq_print_acct+0x11/0x50 [nf_conntrack]
    Call Trace:
    [] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
    [] seq_read+0x2cc/0x390
    [] proc_reg_read+0x42/0x70
    [] __vfs_read+0x37/0x130
    [] ? security_file_permission+0xa0/0xc0
    [] vfs_read+0x95/0x140
    [] SyS_read+0x55/0xc0
    [] entry_SYSCALL_64_fastpath+0x1a/0xa4

    It is very easy to reproduce this kernel crash.
    1. open one shell and input the following cmds:
    while : ; do
    echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize
    done
    2. open more shells and input the following cmds:
    while : ; do
    cat /proc/net/nf_conntrack
    done
    3. just wait a monent, oops will happen soon.

    The solution in this patch is based on Florian's Commit 5e3c61f98175
    ("netfilter: conntrack: fix lookup race during hash resize"). And
    add a wrapper function nf_conntrack_get_ht to get hash and hsize
    suggested by Florian Westphal.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

07 Jul, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next,
    they are:

    1) Don't use userspace datatypes in bridge netfilter code, from
    Tobin Harding.

    2) Iterate only once over the expectation table when removing the
    helper module, instead of once per-netns, from Florian Westphal.

    3) Extra sanitization in xt_hook_ops_alloc() to return error in case
    we ever pass zero hooks, xt_hook_ops_alloc():

    4) Handle NFPROTO_INET from the logging core infrastructure, from
    Liping Zhang.

    5) Autoload loggers when TRACE target is used from rules, this doesn't
    change the behaviour in case the user already selected nfnetlink_log
    as preferred way to print tracing logs, also from Liping Zhang.

    6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
    by cache lines, increases the size of entries in 11% per entry.
    From Florian Westphal.

    7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

    8) Remove useless defensive check in nf_logger_find_get() from Shivani
    Bhardwaj.

    9) Remove zone extension as place it in the conntrack object, this is
    always include in the hashing and we expect more intensive use of
    zones since containers are in place. Also from Florian Westphal.

    10) Owner match now works from any namespace, from Eric Bierdeman.

    11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

    12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

    13) Introduce generic macros for nf_tables object generation masks.

    14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

    15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

    16) Support for deletion of just added elements in the hash set type.

    17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

    18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

    19) Support for matching inverted set lookups, from Arturo Borrero.

    20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

    21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

    22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

    23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

    24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2016

1 commit

  • No need to restrict this to module parameter.

    We export a copy of the real hash size -- when user alters the value we
    allocate the new table, copy entries etc before we update the real size
    to the requested one.

    This is also needed because the real size is used by concurrent readers
    and cannot be changed without synchronizing the conntrack generation
    seqcnt.

    We only allow changing this value from the initial net namespace.

    Tested using http-client-benchmark vs. httpterm with concurrent

    while true;do
    echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
    done

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal