12 Oct, 2020

3 commits


16 Apr, 2020

1 commit

  • nf_remove_net_hook() uses WRITE_ONCE() to assign a 'const' pointer to a
    'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
    that this will give rise to a compiler warning, just like a plain old
    assignment would do:

    | In file included from ./include/linux/export.h:43,
    | from ./include/linux/linkage.h:7,
    | from ./include/linux/kernel.h:8,
    | from net/netfilter/core.c:9:
    | net/netfilter/core.c: In function ‘nf_remove_net_hook’:
    | ./include/linux/compiler.h:216:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
    | *(volatile typeof(x) *)&(x) = (val); \
    | ^
    | net/netfilter/core.c:379:3: note: in expansion of macro ‘WRITE_ONCE’
    | WRITE_ONCE(orig_ops[i], &dummy_ops);
    | ^~~~~~~~~~

    Follow the pattern used elsewhere in this file and add a cast to 'void *'
    to squash the warning.

    Cc: Pablo Neira Ayuso
    Cc: Jozsef Kadlecsik
    Cc: Florian Westphal
    Cc: "David S. Miller"
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Will Deacon

    Will Deacon
     

17 Oct, 2019

1 commit

  • At this time, NF_HOOK_LIST() macro will iterate the list and then calls
    nf_hook() for each individual skb.

    This makes it so the entire list is passed into the netfilter core.
    The advantage is that we only need to fetch the rule blob once per list
    instead of per-skb.

    NF_HOOK_LIST now only works for ipv4 and ipv6, as those are the only
    callers.

    v2: use skb_list_del_init() instead of list_del (Edward Cree)

    Signed-off-by: Florian Westphal
    Acked-by: Edward Cree
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

04 Jul, 2019

1 commit


01 Jun, 2019

1 commit


15 May, 2019

1 commit

  • CONFIG_DEBUG_KERNEL should not impact code generation. Use the newly
    defined CONFIG_DEBUG_MISC instead to keep the current code.

    Link: http://lkml.kernel.org/r/20190413224438.10802-6-okaya@kernel.org
    Signed-off-by: Sinan Kaya
    Acked-by: Florian Westphal
    Reviewed-by: Josh Triplett
    Reviewed-by: Kees Cook
    Cc: Florian Westphal
    Cc: Pablo Neira Ayuso
    Cc: Jozsef Kadlecsik
    Cc: "David S. Miller"
    Cc: Anders Roxell
    Cc: Benjamin Herrenschmidt
    Cc: Christophe Leroy
    Cc: Chris Zankel
    Cc: Greg Kroah-Hartman
    Cc: James Hogan
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Mike Rapoport
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Thomas Bogendoerfer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sinan Kaya
     

12 Apr, 2019

1 commit

  • Replace NF_HOOK() based invocation of the netfilter hooks with a private
    copy of nf_hook_slow().

    This copy has one difference: it can return the rx handler value expected
    by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

    This is needed by the next patch to invoke the ebtables
    "broute" table via the standard netfilter hooks rather than the custom
    "br_should_route_hook" indirection that is used now.

    When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
    bridge rx input handler, but there is no way to indicate this via
    NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
    netfilter core or a percpu flag.

    text data bss dec filename
    3369 56 0 3425 net/bridge/br_input.o.before
    3458 40 0 3498 net/bridge/br_input.o.after

    This allows removal of the "br_should_route_hook" in the next patch.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

06 Jan, 2019

1 commit

  • Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

    The jump label is controlled by HAVE_JUMP_LABEL, which is defined
    like this:

    #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
    # define HAVE_JUMP_LABEL
    #endif

    We can improve this by testing 'asm goto' support in Kconfig, then
    make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

    Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
    match to the real kernel capability.

    Signed-off-by: Masahiro Yamada
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Sedat Dilek

    Masahiro Yamada
     

11 Jul, 2018

1 commit

  • This adds a global netfilter function to extract a conntrack tuple from an
    skb. The function uses a new function added to nf_ct_hook, which will try
    to get the tuple from skb->_nfct, and do a full lookup if that fails. This
    makes it possible to use the lookup function before the skb has passed
    through the conntrack init hooks (e.g., in an ingress qdisc). The tuple is
    copied to the caller to avoid issues with reference counting.

    The function returns false if conntrack is not loaded, allowing it to be
    used without incurring a module dependency on conntrack. This is used by
    the NAT mode in sch_cake.

    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller

    Toke Høiland-Jørgensen
     

24 May, 2018

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree, they are:

    1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.

    2) Add support for map lookups to numgen, random and hash expressions,
    from Laura Garcia.

    3) Allow to register nat hooks for iptables and nftables at the same
    time. Patchset from Florian Westpha.

    4) Timeout support for rbtree sets.

    5) ip6_rpfilter works needs interface for link-local addresses, from
    Vincent Bernat.

    6) Add nf_ct_hook and nf_nat_hook structures and use them.

    7) Do not drop packets on packets raceing to insert conntrack entries
    into hashes, this is particularly a problem in nfqueue setups.

    8) Address fallout from xt_osf separation to nf_osf, patches
    from Florian Westphal and Fernando Mancera.

    9) Remove reference to struct nft_af_info, which doesn't exist anymore.
    From Taehee Yoo.

    This batch comes with is a conflict between 25fd386e0bc0 ("netfilter:
    core: add missing __rcu annotation") in your tree and 2c205dd3981f
    ("netfilter: add struct nf_nat_hook and use it") coming in this batch.
    This conflict can be solved by leaving the __rcu tag on
    __netfilter_net_init() - added by 25fd386e0bc0 - and remove all code
    related to nf_nat_decode_session_hook - which is gone after
    2c205dd3981f, as described by:

    diff --cc net/netfilter/core.c
    index e0ae4aae96f5,206fb2c4c319..168af54db975
    --- a/net/netfilter/core.c
    +++ b/net/netfilter/core.c
    @@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
    EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
    #endif /* CONFIG_NF_CONNTRACK */

    - static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
    -#ifdef CONFIG_NF_NAT_NEEDED
    -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
    -EXPORT_SYMBOL(nf_nat_decode_session_hook);
    -#endif
    -
    + static void __net_init
    + __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
    {
    int h;

    I can also merge your net-next tree into nf-next, solve the conflict and
    resend the pull request if you prefer so.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

23 May, 2018

4 commits


08 May, 2018

1 commit

  • removes following sparse error:
    net/netfilter/core.c:598:30: warning: incorrect type in argument 1 (different address spaces)
    net/netfilter/core.c:598:30: expected struct nf_hook_entries **e
    net/netfilter/core.c:598:30: got struct nf_hook_entries [noderef] **

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

10 Jan, 2018

2 commits


09 Jan, 2018

13 commits

  • This abstraction has no clients anymore, remove it.

    This is what remains from previous authors, so correct copyright
    statement after recent modifications and code removal.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Expand NFPROTO_INET in two hook registrations, one for NFPROTO_IPV4 and
    another for NFPROTO_IPV6. Hence, we handle NFPROTO_INET from the core.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • So static_key_slow_dec applies to the family behind NFPROTO_INET.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Instead of passing struct nf_hook_ops, this is needed by follow up
    patches to handle NFPROTO_INET from the core.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Just a cleanup, __nf_unregister_net_hook() is used by a follow up patch
    when handling NFPROTO_INET as a real family from the core.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • The netfilter NAT core cannot deal with more than one NAT hook per hook
    location (prerouting, input ...), because the NAT hooks install a NAT null
    binding in case the iptables nat table (iptable_nat hooks) or the
    corresponding nftables chain (nft nat hooks) doesn't specify a nat
    transformation.

    Null bindings are needed to detect port collsisions between NAT-ed and
    non-NAT-ed connections.

    This causes nftables NAT rules to not work when iptable_nat module is
    loaded, and vice versa because nat binding has already been attached
    when the second nat hook is consulted.

    The netfilter core is not really the correct location to handle this
    (hooks are just hooks, the core has no notion of what kinds of side
    effects a hook implements), but its the only place where we can check
    for conflicts between both iptables hooks and nftables hooks without
    adding dependencies.

    So add nat annotation to hook_ops to describe those hooks that will
    add NAT bindings and then make core reject if such a hook already exists.
    The annotation fills a padding hole, in case further restrictions appar
    we might change this to a 'u8 type' instead of bool.

    iptables error if nft nat hook active:
    iptables -t nat -A POSTROUTING -j MASQUERADE
    iptables v1.4.21: can't initialize iptables table `nat': File exists
    Perhaps iptables or your kernel needs to be upgraded.

    nftables error if iptables nat table present:
    nft -f /etc/nftables/ipv4-nat
    /usr/etc/nftables/ipv4-nat:3:1-2: Error: Could not process rule: File exists
    table nat {
    ^^

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • no need to define hook points if the family isn't supported.
    Because we need these hooks for either nftables, arp/ebtables
    or the 'call-iptables' hack we have in the bridge layer add two
    new dependencies, NETFILTER_FAMILY_{ARP,BRIDGE}, and have the
    users select them.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • no need to define hook points if the family isn't supported.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Not all families share the same hook count, adjust sizes to what is
    needed.

    struct net before:
    /* size: 6592, cachelines: 103, members: 46 */
    after:
    /* size: 5952, cachelines: 93, members: 46 */

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • struct net contains:

    struct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS];

    which store the hook entry point locations for the various protocol
    families and the hooks.

    Using array results in compact c code when doing accesses, i.e.
    x = rcu_dereference(net->nf.hooks[pf][hook]);

    but its also wasting a lot of memory, as most families are
    not used.

    So split the array into those families that are used, which
    are only 5 (instead of 13). In most cases, the 'pf' argument is
    constant, i.e. gcc removes switch statement.

    struct net before:
    /* size: 5184, cachelines: 81, members: 46 */
    after:
    /* size: 4672, cachelines: 73, members: 46 */

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Giuseppe Scrivano says:
    "SELinux, if enabled, registers for each new network namespace 6
    netfilter hooks."

    Cost for this is high. With synchronize_net() removed:
    "The net benefit on an SMP machine with two cores is that creating a
    new network namespace takes -40% of the original time."

    This patch replaces synchronize_net+kvfree with call_rcu().
    We store rcu_head at the tail of a structure that has no fixed layout,
    i.e. we cannot use offsetof() to compute the start of the original
    allocation. Thus store this information right after the rcu head.

    We could simplify this by just placing the rcu_head at the start
    of struct nf_hook_entries. However, this structure is used in
    packet processing hotpath, so only place what is needed for that
    at the beginning of the struct.

    Reported-by: Giuseppe Scrivano
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • since commit 960632ece6949b ("netfilter: convert hook list to an array")
    nfqueue no longer stores a pointer to the hook that caused the packet
    to be queued. Therefore no extra synchronize_net() call is needed after
    dropping the packets enqueued by the old rule blob.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This reverts commit d3ad2c17b4047
    ("netfilter: core: batch nf_unregister_net_hooks synchronize_net calls").

    Nothing wrong with it. However, followup patch will delay freeing of hooks
    with call_rcu, so all synchronize_net() calls become obsolete and there
    is no need anymore for this batching.

    This revert causes a temporary performance degradation when destroying
    network namespace, but its resolved with the upcoming call_rcu conversion.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

09 Sep, 2017

1 commit

  • kernel test robot reported:

    WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 __nf_hook_entries_try_shrink+0x49/0xcd
    [..]

    After allowing batching in nf_unregister_net_hooks its possible that an earlier
    call to __nf_hook_entries_try_shrink already compacted the list.
    If this happens we don't need to do anything.

    Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks synchronize_net calls")
    Reported-by: kernel test robot
    Signed-off-by: Florian Westphal
    Acked-by: Aaron Conole
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

28 Aug, 2017

3 commits

  • re-add batching in nf_unregister_net_hooks().

    Similar as before, just store an array with to-be-free'd rule arrays
    on stack, then call synchronize_net once per batch.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Make sure our grow/shrink routine places them in the correct order.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This converts the storage and layout of netfilter hook entries from a
    linked list to an array. After this commit, hook entries will be
    stored adjacent in memory. The next pointer is no longer required.

    The ops pointers are stored at the end of the array as they are only
    used in the register/unregister path and in the legacy br_netfilter code.

    nf_unregister_net_hooks() is slower than needed as it just calls
    nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
    calls), this will be addressed in followup patch.

    Test setup:
    - ixgbe 10gbit
    - netperf UDP_STREAM, 64 byte packets
    - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
    empty mangle and raw prerouting, mangle and filter input hooks:
    353.9
    this patch:
    364.2

    Signed-off-by: Aaron Conole
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Aaron Conole
     

19 Jul, 2017

1 commit


17 Jul, 2017

1 commit

  • no more users in the tree, remove this.

    The old api is racy wrt. module removal, all users have been converted
    to the netns-aware api.

    The old api pretended we still have global hooks but that has not been
    true for a long time.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

01 May, 2017

2 commits

  • nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
    provided there is no nfqueue active in that net namespace (which is
    the common case).

    This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
    this gets called during netns cleanup so no packets should be queued.

    For the rare case of base chain being unregistered or module removal
    while nfqueue is in use the extra hiccup due to the packet drops isn't
    a big deal.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • synchronize_net is expensive and slows down netns cleanup a lot.

    We have two APIs to unregister a hook:
    nf_unregister_net_hook (which calls synchronize_net())
    and
    nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)

    Make nf_unregister_net_hook a wapper around new helper
    __nf_unregister_net_hook, which unlinks the hook but does not free it.

    Then, we can call that helper in nf_unregister_net_hooks and then
    call synchronize_net() only once.

    Andrey Konovalov reports this change improves syzkaller fuzzing speed at
    least twice.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal