28 May, 2020

1 commit

  • Conntrack dump does not support kernel side filtering (only get exists,
    but it returns only one entry. And user has to give a full valid tuple)

    It means that userspace has to implement filtering after receiving many
    irrelevant entries, consuming resources (conntrack table is sometimes
    very huge, much more than a routing table for example).

    This patch adds filtering in kernel side. To achieve this goal, we:

    * Add a new CTA_FILTER netlink attributes, actually a flag list to
    parametize filtering
    * Convert some *nlattr_to_tuple() functions, to allow a partial parsing
    of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not
    fully set)

    Filtering is now possible on:
    * IP SRC/DST values
    * Ports for TCP and UDP flows
    * IMCP(v6) codes types and IDs

    Filtering is done as an "AND" operator. For example, when flags
    PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all
    values are dumped.

    Changes since v1:
    Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered

    Changes since v2:
    Move several constants to nf_internals.h
    Move a fix on netlink values check in a separate patch
    Add a check on not-supported flags
    Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack
    (not yet implemented)
    Code style issues

    Changes since v3:
    Fix compilation warning reported by kbuild test robot

    Changes since v4:
    Fix a regression introduced in v3 (returned EINVAL for valid netlink
    messages without CTA_MARK)

    Changes since v5:
    Change definition of CTA_FILTER_F_ALL
    Fix a regression when CTA_TUPLE_ZONE is not set

    Signed-off-by: Romain Bellan
    Signed-off-by: Florent Fourcot
    Signed-off-by: Pablo Neira Ayuso

    Romain Bellan
     

12 Apr, 2019

1 commit

  • Replace NF_HOOK() based invocation of the netfilter hooks with a private
    copy of nf_hook_slow().

    This copy has one difference: it can return the rx handler value expected
    by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

    This is needed by the next patch to invoke the ebtables
    "broute" table via the standard netfilter hooks rather than the custom
    "br_should_route_hook" indirection that is used now.

    When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
    bridge rx input handler, but there is no way to indicate this via
    NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
    netfilter core or a percpu flag.

    text data bss dec filename
    3369 56 0 3425 net/bridge/br_input.o.before
    3458 40 0 3498 net/bridge/br_input.o.after

    This allows removal of the "br_should_route_hook" in the next patch.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

23 May, 2018

1 commit


09 Jan, 2018

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

29 Aug, 2017

1 commit


28 Aug, 2017

1 commit

  • This converts the storage and layout of netfilter hook entries from a
    linked list to an array. After this commit, hook entries will be
    stored adjacent in memory. The next pointer is no longer required.

    The ops pointers are stored at the end of the array as they are only
    used in the register/unregister path and in the legacy br_netfilter code.

    nf_unregister_net_hooks() is slower than needed as it just calls
    nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
    calls), this will be addressed in followup patch.

    Test setup:
    - ixgbe 10gbit
    - netperf UDP_STREAM, 64 byte packets
    - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
    empty mangle and raw prerouting, mangle and filter input hooks:
    353.9
    this patch:
    364.2

    Signed-off-by: Aaron Conole
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Aaron Conole
     

19 Aug, 2017

1 commit


01 May, 2017

1 commit

  • nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
    provided there is no nfqueue active in that net namespace (which is
    the common case).

    This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
    this gets called during netns cleanup so no packets should be queued.

    For the rare case of base chain being unregistered or module removal
    while nfqueue is in use the extra hiccup due to the packet drops isn't
    a big deal.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

03 Nov, 2016

1 commit

  • nf_iterate() has become rather simple, we can integrate this code into
    nf_hook_slow() to reduce the amount of LOC in the core path.

    However, we still need nf_iterate() around for nf_queue packet handling,
    so move this function there where we only need it. I think it should be
    possible to refactor nf_queue code to get rid of it definitely, but
    given this is slow path anyway, let's have a look this later.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

21 Oct, 2016

1 commit

  • nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace
    list_head with single linked list") for two reasons:

    1) If the bypass flag is set on, there are no userspace listeners and
    we still have more hook entries to iterate over, then jump to the
    next hook. Otherwise accept the packet. On nf_reinject() path, the
    okfn() needs to be invoked.

    2) We should not re-enter the same hook on packet reinjection. If the
    packet is accepted, we have to skip the current hook from where the
    packet was enqueued, otherwise the packets gets enqueued over and
    over again.

    This restores the previous list_for_each_entry_continue() behaviour
    happening from nf_iterate() that was dealing with these two cases.
    This patch introduces a new nf_queue() wrapper function so this fix
    becomes simpler.

    Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

25 Sep, 2016

1 commit

  • The netfilter hook list never uses the prev pointer, and so can be trimmed to
    be a simple singly-linked list.

    In addition to having a more light weight structure for hook traversal,
    struct net becomes 5568 bytes (down from 6400) and struct net_device becomes
    2176 bytes (down from 2240).

    Signed-off-by: Aaron Conole
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Aaron Conole
     

23 Jul, 2015

1 commit

  • This function reacquires the rtnl_lock() which is already held by
    nf_unregister_hook().

    This can be triggered via: modprobe nf_conntrack_ipv4 && rmmod nf_conntrack_ipv4

    [ 720.628746] INFO: task rmmod:3578 blocked for more than 120 seconds.
    [ 720.628749] Not tainted 4.2.0-rc2+ #113
    [ 720.628752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 720.628754] rmmod D ffff8800ca46fd58 0 3578 3571 0x00000080
    [...]
    [ 720.628783] Call Trace:
    [ 720.628790] [] schedule+0x6b/0x90
    [ 720.628795] [] schedule_preempt_disabled+0x13/0x20
    [ 720.628799] [] mutex_lock_nested+0x1f5/0x380
    [ 720.628803] [] ? rtnl_lock+0x12/0x20
    [ 720.628807] [] ? rtnl_lock+0x12/0x20
    [ 720.628812] [] rtnl_lock+0x12/0x20
    [ 720.628817] [] nf_queue_nf_hook_drop+0x15/0x160
    [ 720.628825] [] nf_unregister_net_hook+0x168/0x190
    [ 720.628831] [] nf_unregister_hook+0x64/0x80
    [ 720.628837] [] nf_unregister_hooks+0x20/0x30
    [...]

    Moreover, nf_unregister_net_hook() should only destroy the queue for this
    netns, not for every netns.

    Reported-by: Fengguang Wu
    Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.")
    Signed-off-by: Pablo Neira Ayuso
    Acked-by: "Eric W. Biederman"

    Pablo Neira Ayuso
     

23 Jun, 2015

1 commit

  • Add code to nf_unregister_hook to flush the nf_queue when a hook is
    unregistered. This guarantees that the pointer that the nf_queue code
    retains into the nf_hook list will remain valid while a packet is
    queued.

    I tested what would happen if we do not flush queued packets and was
    trivially able to obtain the oops below. All that was required was
    to stop the nf_queue listening process, to delete all of the nf_tables,
    and to awaken the nf_queue listening process.

    > BUG: unable to handle kernel paging request at 0000000100000001
    > IP: [] 0x100000001
    > PGD b9c35067 PUD 0
    > Oops: 0010 [#1] SMP
    > Modules linked in:
    > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted
    > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000
    > RIP: 0010:[] [] 0x100000001
    > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16
    > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90
    > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00
    > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28
    > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900
    > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000
    > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
    > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0
    > Stack:
    > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8
    > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128
    > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0
    > Call Trace:
    > [] ? nf_iterate+0x4f/0xa0
    > [] ? nf_reinject+0x125/0x190
    > [] ? nfqnl_recv_verdict+0x255/0x360
    > [] ? nla_parse+0x80/0xf0
    > [] ? nfnetlink_rcv_msg+0x13c/0x240
    > [] ? __memcg_kmem_get_cache+0x4c/0x150
    > [] ? nfnl_lock+0x20/0x20
    > [] ? netlink_rcv_skb+0xa9/0xc0
    > [] ? netlink_unicast+0x12f/0x1c0
    > [] ? netlink_sendmsg+0x28e/0x650
    > [] ? sock_sendmsg+0x44/0x50
    > [] ? ___sys_sendmsg+0x2ab/0x2c0
    > [] ? __wake_up+0x43/0x70
    > [] ? tty_write+0x1c4/0x2a0
    > [] ? __sys_sendmsg+0x44/0x80
    > [] ? system_call_fastpath+0x12/0x6a
    > Code: Bad RIP value.
    > RIP [] 0x100000001
    > RSP
    > CR2: 0000000100000001
    > ---[ end trace 08eb65d42362793f ]---

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

05 Apr, 2015

1 commit

  • Instead of passing a large number of arguments down into the nf_hook()
    entry points, create a structure which carries this state down through
    the hook processing layers.

    This makes is so that if we want to change the types or signatures of
    any of these pieces of state, there are less places that need to be
    changed.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Oct, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

03 Sep, 2012

2 commits


13 May, 2010

1 commit


08 Oct, 2008

1 commit


16 Oct, 2007

1 commit


13 Feb, 2007

1 commit


23 Sep, 2006

1 commit


01 Jul, 2006

1 commit


30 Aug, 2005

1 commit