23 Sep, 2020

1 commit

  • Two minor conflicts:

    1) net/ipv4/route.c, adding a new local variable while
    moving another local variable and removing it's
    initial assignment.

    2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
    One pretty prints the port mode differently, whilst another
    changes the driver to try and obtain the port mode from
    the port node rather than the switch node.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Sep, 2020

2 commits

  • conntrack mark based dump filtering may falsely skip entries if a mask
    is given: If the mask-based check does not filter out the entry, the
    else-if check is always true and compares the mark without considering
    the mask. The if/else-if logic seems wrong.

    Given that the mask during filter setup is implicitly set to 0xffffffff
    if not specified explicitly, the mark filtering flags seem to just
    complicate things. Restore the previously used approach by always
    matching against a zero mask is no filter mark is given.

    Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump")
    Signed-off-by: Martin Willi
    Signed-off-by: Pablo Neira Ayuso

    Martin Willi
     
  • The indexes to the nf_nat_l[34]protos arrays come from userspace. So
    check the tuple's family, e.g. l3num, when creating the conntrack in
    order to prevent an OOB memory access during setup. Here is an example
    kernel panic on 4.14.180 when userspace passes in an index greater than
    NFPROTO_NUMPROTO.

    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    Modules linked in:...
    Process poc (pid: 5614, stack limit = 0x00000000a3933121)
    CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483
    Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM
    task: 000000002a3dfffe task.stack: 00000000a3933121
    pc : __cfi_check_fail+0x1c/0x24
    lr : __cfi_check_fail+0x1c/0x24
    ...
    Call trace:
    __cfi_check_fail+0x1c/0x24
    name_to_dev_t+0x0/0x468
    nfnetlink_parse_nat_setup+0x234/0x258
    ctnetlink_parse_nat_setup+0x4c/0x228
    ctnetlink_new_conntrack+0x590/0xc40
    nfnetlink_rcv_msg+0x31c/0x4d4
    netlink_rcv_skb+0x100/0x184
    nfnetlink_rcv+0xf4/0x180
    netlink_unicast+0x360/0x770
    netlink_sendmsg+0x5a0/0x6a4
    ___sys_sendmsg+0x314/0x46c
    SyS_sendmsg+0xb4/0x108
    el0_svc_naked+0x34/0x38

    This crash is not happening since 5.4+, however, ctnetlink still
    allows for creating entries with unsupported layer 3 protocol number.

    Fixes: c1d10adb4a521 ("[NETFILTER]: Add ctnetlink port for nf_conntrack")
    Signed-off-by: Will McVicker
    [pablo@netfilter.org: rebased original patch on top of nf.git]
    Signed-off-by: Pablo Neira Ayuso

    Will McVicker
     

29 Aug, 2020

2 commits

  • There is a misconception about what "insert_failed" means.

    We increment this even when a clash got resolved, so it might not indicate
    a problem.

    Add a dedicated counter for clash resolution and only increment
    insert_failed if a clash cannot be resolved.

    For the old /proc interface, export this in place of an older stat
    that got removed a while back.
    For ctnetlink, export this with a new attribute.

    Also correct an outdated comment that implies we add a duplicate tuple --
    we only add the (unique) reply direction.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This counter increments when nf_conntrack_in sees a packet that already
    has a conntrack attached or when the packet is marked as UNTRACKED.
    Neither is an error.

    The former is normal for loopback traffic. The second happens for
    certain ICMPv6 packets or when nftables/ip(6)tables rules are in place.

    In case someone needs to count UNTRACKED packets, or packets
    that are marked as untracked before conntrack_in this can be done with
    both nftables and ip(6)tables rules.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

11 Jun, 2020

1 commit


28 May, 2020

1 commit

  • Conntrack dump does not support kernel side filtering (only get exists,
    but it returns only one entry. And user has to give a full valid tuple)

    It means that userspace has to implement filtering after receiving many
    irrelevant entries, consuming resources (conntrack table is sometimes
    very huge, much more than a routing table for example).

    This patch adds filtering in kernel side. To achieve this goal, we:

    * Add a new CTA_FILTER netlink attributes, actually a flag list to
    parametize filtering
    * Convert some *nlattr_to_tuple() functions, to allow a partial parsing
    of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not
    fully set)

    Filtering is now possible on:
    * IP SRC/DST values
    * Ports for TCP and UDP flows
    * IMCP(v6) codes types and IDs

    Filtering is done as an "AND" operator. For example, when flags
    PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all
    values are dumped.

    Changes since v1:
    Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered

    Changes since v2:
    Move several constants to nf_internals.h
    Move a fix on netlink values check in a separate patch
    Add a check on not-supported flags
    Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack
    (not yet implemented)
    Code style issues

    Changes since v3:
    Fix compilation warning reported by kbuild test robot

    Changes since v4:
    Fix a regression introduced in v3 (returned EINVAL for valid netlink
    messages without CTA_MARK)

    Changes since v5:
    Change definition of CTA_FILTER_F_ALL
    Fix a regression when CTA_TUPLE_ZONE is not set

    Signed-off-by: Romain Bellan
    Signed-off-by: Florent Fourcot
    Signed-off-by: Pablo Neira Ayuso

    Romain Bellan
     

30 Mar, 2020

1 commit


28 Mar, 2020

1 commit


29 Nov, 2019

1 commit

  • Curtis Taylor and Jon Maxwell reported and debugged a crash on 3.10
    based kernel.

    Crash occurs in ctnetlink_conntrack_events because net->nfnl socket is
    NULL. The nfnl socket was set to NULL by netns destruction running on
    another cpu.

    The exiting network namespace calls the relevant destructors in the
    following order:

    1. ctnetlink_net_exit_batch

    This nulls out the event callback pointer in struct netns.

    2. nfnetlink_net_exit_batch

    This nulls net->nfnl socket and frees it.

    3. nf_conntrack_cleanup_net_list

    This removes all remaining conntrack entries.

    This is order is correct. The only explanation for the crash so ar is:

    cpu1: conntrack is dying, eviction occurs:
    -> nf_ct_delete()
    -> nf_conntrack_event_report \
    -> nf_conntrack_eventmask_report
    -> notify->fcn() (== ctnetlink_conntrack_events).

    cpu1: a. fetches rcu protected pointer to obtain ctnetlink event callback.
    b. gets interrupted.
    cpu2: runs netns exit handlers:
    a runs ctnetlink destructor, event cb pointer set to NULL.
    b runs nfnetlink destructor, nfnl socket is closed and set to NULL.
    cpu1: c. resumes and trips over NULL net->nfnl.

    Problem appears to be that ctnetlink_net_exit_batch only prevents future
    callers of nf_conntrack_eventmask_report() from obtaining the callback.
    It doesn't wait of other cpus that might have already obtained the
    callbacks address.

    I don't see anything in upstream kernels that would prevent similar
    crash: We need to wait for all cpus to have exited the event callback.

    Fixes: 9592a5c01e79dbc59eb56fa ("netfilter: ctnetlink: netns support")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Oct, 2019

1 commit

  • When dumping the unconfirmed lists, the cpu that is processing the ct
    entry can reallocate ct->ext at any time.

    Right now accessing the extensions from another CPU is ok provided
    we're holding rcu read lock: extension reallocation does use rcu.

    Once RCU isn't used anymore this becomes unsafe, so skip extensions for
    the unconfirmed list.

    Dumping the extension area for confirmed or dying conntracks is fine:
    no reallocations are allowed and list iteration holds appropriate
    locks that prevent ct (and this ct->ext) from getting free'd.

    v2: fix compiler warnings due to misue of 'const' and missing return
    statement (kbuild robot).

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

04 Sep, 2019

1 commit


16 Jul, 2019

1 commit

  • When conntracks change during a dialog, SDP messages may be sent from
    different conntracks to establish expects with identical tuples. In this
    case expects conflict may be detected for the 2nd SDP message and end up
    with a process failure.

    The fixing here is to reuse an existing expect who has the same tuple for a
    different conntrack if any.

    Here are two scenarios for the case.

    1)
    SERVER CPE

    | INVITE SDP |
    5060 ||5060
    | 183 SDP |
    5060 |---------------------->|5060 ===> Conntrack 1
    | PRACK |
    50601 ||5060
    | 200 OK (INVITE) |
    5060 |---------------------->|5060
    | ACK |
    50601 ||
    | |
    | INVITE SDP (t38) |
    50601 |---------------------->|5060 ===> Conntrack 2

    With a certain configuration in the CPE, SIP messages "183 with SDP" and
    "re-INVITE with SDP t38" will go through the sip helper to create
    expects for RTP and RTCP.

    It is okay to create RTP and RTCP expects for "183", whose master
    connection source port is 5060, and destination port is 5060.

    In the "183" message, port in Contact header changes to 50601 (from the
    original 5060). So the following requests e.g. PRACK and ACK are sent to
    port 50601. It is a different conntrack (let call Conntrack 2) from the
    original INVITE (let call Conntrack 1) due to the port difference.

    In this example, after the call is established, there is RTP stream but no
    RTCP stream for Conntrack 1, so the RTP expect created upon "183" is
    cleared, and RTCP expect created for Conntrack 1 retains.

    When "re-INVITE with SDP t38" arrives to create RTP&RTCP expects, current
    ALG implementation will call nf_ct_expect_related() for RTP and RTCP. The
    expects tuples are identical to those for Conntrack 1. RTP expect for
    Conntrack 2 succeeds in creation as the one for Conntrack 1 has been
    removed. RTCP expect for Conntrack 2 fails in creation because it has
    idential tuples and 'conflict' with the one retained for Conntrack 1. And
    then result in a failure in processing of the re-INVITE.

    2)

    SERVER A CPE

    | REGISTER |
    5060 | CT1
    | 200 |
    5060 |------------------>| 5060
    | |
    | INVITE SDP(1) |
    5060 || 5060 SERVER B
    | ACK |
    5060 || 5060 ==> CT2
    | 100 |
    5060 || 50601 ==> CT3
    | |
    ||
    | |
    | BYE |
    5060 || 50601
    | INVITE SDP(3) |
    5060 | CT1

    CPE sends an INVITE request(1) to Server A, and creates a RTP&RTCP expect
    pair for this Conntrack 1 (CT1). Server A responds 300 to redirect to
    Server B. The RTP&RTCP expect pairs created on CT1 are removed upon 300
    response.

    CPE sends the INVITE request(2) to Server B, and creates an expect pair
    for the new conntrack (due to destination address difference), let call
    CT2. Server B changes the port to 50601 in 200 OK response, and the
    following requests ACK and BYE from CPE are sent to 50601. The call is
    established. There is RTP stream and no RTCP stream. So RTP expect is
    removed and RTCP expect for CT2 retains.

    As BYE request is sent from port 50601, it is another conntrack, let call
    CT3, different from CT2 due to the port difference. So the BYE request will
    not remove the RTCP expect for CT2.

    Then another outgoing call is made, with the same RTP port being used (not
    definitely but possibly). CPE firstly sends the INVITE request(3) to Server
    A, and tries to create a RTP&RTCP expect pairs for this CT1. In current ALG
    implementation, the RTCP expect for CT1 fails in creation because it
    'conflicts' with the residual one for CT2. As a result the INVITE request
    fails to send.

    Signed-off-by: xiao ruizhu
    Signed-off-by: Pablo Neira Ayuso

    xiao ruizhu
     

26 Jun, 2019

1 commit

  • Commit f8e608982022 ("netfilter: ctnetlink: Resolve conntrack
    L3-protocol flush regression") introduced a regression in which deletion
    of conntrack entries would fail because the L3 protocol information
    is replaced by AF_UNSPEC. As a result the search for the entry to be
    deleted would turn up empty due to the tuple used to perform the search
    is now different from the tuple used to initially set up the entry.

    For flushing the conntrack table we do however want to keep the option
    for nfgenmsg->version to have a non-zero value to allow for newer
    user-space tools to request treatment under the new behavior. With that
    it is possible to independently flush tables for a defined L3 protocol.
    This was introduced with the enhancements in in commit 59c08c69c278
    ("netfilter: ctnetlink: Support L3 protocol-filter on flush").

    Older user-space tools will retain the behavior of flushing all tables
    regardless of defined L3 protocol.

    Fixes: f8e608982022 ("netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression")
    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Felix Kaechele
    Signed-off-by: Pablo Neira Ayuso

    Felix Kaechele
     

13 May, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Postpone chain policy update to drop after transaction is complete,
    from Florian Westphal.

    2) Add entry to flowtable after confirmation to fix UDP flows with
    packets going in one single direction.

    3) Reference count leak in dst object, from Taehee Yoo.

    4) Check for TTL field in flowtable datapath, from Taehee Yoo.

    5) Fix h323 conntrack helper due to incorrect boundary check,
    from Jakub Jankowski.

    6) Fix incorrect rcu dereference when fetching basechain stats,
    from Florian Westphal.

    7) Missing error check when adding new entries to flowtable,
    from Taehee Yoo.

    8) Use version field in nfnetlink message to honor the nfgen_family
    field, from Kristian Evensen.

    9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
    from Subash Abhinov Kasiviswanathan.

    10) Prevent dying entries from being added to the flowtable,
    from Taehee Yoo.

    11) Don't hit WARN_ON() with malformed blob in ebtables with
    trailing data after last rule, reported by syzbot, patch
    from Florian Westphal.

    12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
    code.

    13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
    Westphal.

    This batch comes with a conflict that can be fixed with this patch:

    diff --cc include/uapi/linux/netfilter/nf_tables.h
    index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
    --- a/include/uapi/linux/netfilter/nf_tables.h
    +++ b/include/uapi/linux/netfilter/nf_tables.h
    @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
    * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
    * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
    * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
    - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
    + * @NFT_CT_ID: conntrack id
    */
    enum nft_ct_keys {
    NFT_CT_STATE,
    @@@ -991,6 -993,8 +992,7 @@@
    NFT_CT_DST_IP,
    NFT_CT_SRC_IP6,
    NFT_CT_DST_IP6,
    - NFT_CT_TIMEOUT,
    + NFT_CT_ID,
    __NFT_CT_MAX
    };
    #define NFT_CT_MAX (__NFT_CT_MAX - 1)

    That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
    I can also solve this conflict here, just let me know.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 May, 2019

1 commit

  • Commit 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter
    on flush") introduced a user-space regression when flushing connection
    track entries. Before this commit, the nfgen_family field was not used
    by the kernel and all entries were removed. Since this commit,
    nfgen_family is used to filter out entries that should not be removed.
    One example a broken tool is conntrack. conntrack always sets
    nfgen_family to AF_INET, so after 59c08c69c278 only IPv4 entries were
    removed with the -F parameter.

    Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the
    regression, and this commit implements his suggestion. nfgenmsg->version
    is so far set to zero, so it is well-suited to be used as a flag for
    selecting old or new flush behavior. If version is 0, nfgen_family is
    ignored and all entries are used. If user-space sets the version to one
    (or any other value than 0), then the new behavior is used. As version
    only can have two valid values, I chose not to add a new
    NFNETLINK_VERSION-constant.

    Fixes: 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
    Reported-by: Nicolas Dichtel
    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Kristian Evensen
    Tested-by: Nicolas Dichtel
    Signed-off-by: Pablo Neira Ayuso

    Kristian Evensen
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

26 Apr, 2019

1 commit


15 Apr, 2019

1 commit

  • else, we leak the addresses to userspace via ctnetlink events
    and dumps.

    Compute an ID on demand based on the immutable parts of nf_conn struct.

    Another advantage compared to using an address is that there is no
    immediate re-use of the same ID in case the conntrack entry is freed and
    reallocated again immediately.

    Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID")
    Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

09 Apr, 2019

1 commit


27 Feb, 2019

1 commit

  • The l3proto name is gone, its header file is the last trace.
    While at it, also remove nf_nat_core.h, its very small and all users
    include nf_nat.h too.

    before:
    text data bss dec hex filename
    22948 1612 4136 28696 7018 nf_nat.ko

    after removal of l3proto register/unregister functions:
    text data bss dec hex filename
    22196 1516 4136 27848 6cc8 nf_nat.ko

    checkpatch complains about overly long lines, but line breaks
    do not make things more readable and the line length gets smaller
    here, not larger.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

12 Feb, 2019

1 commit


18 Jan, 2019

1 commit

  • Its now same as __nf_ct_l4proto_find(), so rename that to
    nf_ct_l4proto_find and use it everywhere.

    It never returns NULL and doesn't need locks or reference counts.

    Before this series:
    302824 net/netfilter/nf_conntrack.ko
    21504 net/netfilter/nf_conntrack_proto_gre.ko

    text data bss dec hex filename
    6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko
    108356 20613 236 129205 1f8b5 nf_conntrack.ko

    After:
    294864 net/netfilter/nf_conntrack.ko
    text data bss dec hex filename
    106979 19557 240 126776 1ef38 nf_conntrack.ko

    so, even with builtin gre, total size got reduced.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

18 Dec, 2018

1 commit

  • This removes the (now empty) nf_nat_l4proto struct, all its instances
    and all the no longer needed runtime (un)register functionality.

    nf_nat_need_gre() can be axed as well: the module that calls it (to
    load the no-longer-existing nat_gre module) also calls other nat core
    functions. GRE nat is now always available if kernel is built with it.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

12 Nov, 2018

1 commit

  • Useful to only set a particular range of the conntrack mark while
    leaving existing parts of the value alone, e.g. when updating
    conntrack marks via netlink from userspace.

    For NFQUEUE it was already implemented in commit 534473c6080e
    ("netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark").

    This now adds the same functionality also for the other netlink
    conntrack mark changes.

    Signed-off-by: Andreas Jaggi
    Signed-off-by: Pablo Neira Ayuso

    Andreas Jaggi
     

21 Sep, 2018

2 commits

  • else we will oops (null deref) when the attributes aren't present.

    Also add back the EOPNOTSUPP in case MARK filtering is requested but
    kernel doesn't support it.

    Fixes: 59c08c69c2788 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
    Reported-by: syzbot+e45eda8eda6e93a03959@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • l4 protocols are demuxed by l3num, l4num pair.

    However, almost all l4 trackers are l3 agnostic.

    Only exceptions are:
    - gre, icmp (ipv4 only)
    - icmpv6 (ipv6 only)

    This commit gets rid of the l3 mapping, l4 trackers can now be looked up
    by their IPPROTO_XXX value alone, which gets rid of the additional l3
    indirection.

    For icmp, ipcmp6 and gre, add a check on state->pf and
    return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
    this seems more fitting than using the generic tracker.

    Additionally we can kill the 2nd l4proto definitions that were needed
    for v4/v6 split -- they are now the same so we can use single l4proto
    struct for each protocol, rather than two.

    The EXPORT_SYMBOLs can be removed as all these object files are
    part of nf_conntrack with no external references.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Sep, 2018

1 commit

  • The same connection mark can be set on flows belonging to different
    address families. This commit adds support for filtering on the L3
    protocol when flushing connection track entries. If no protocol is
    specified, then all L3 protocols match.

    In order to avoid code duplication and a redundant check, the protocol
    comparison in ctnetlink_dump_table() has been removed. Instead, a filter
    is created if the GET-message triggering the dump contains an address
    family. ctnetlink_filter_match() is then used to compare the L3
    protocols.

    Signed-off-by: Kristian Evensen
    Signed-off-by: Pablo Neira Ayuso

    Kristian Evensen
     

17 Aug, 2018

1 commit

  • Shaochun Chen points out we leak dumper filter state allocations
    stored in dump_control->data in case there is an error before netlink sets
    cb_running (after which ->done will be called at some point).

    In order to fix this, add .start functions and move allocations there.

    Same pattern as used in commit 90fd131afc565159c9e0ea742f082b337e10f8c6
    ("netfilter: nf_tables: move dumper state allocation into ->start").

    Reported-by: shaochun chen
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

18 Jul, 2018

1 commit


16 Jul, 2018

1 commit

  • handle everything from ctnetlink directly.

    After all these years we still only support ipv4 and ipv6, so it
    seems reasonable to remove l3 protocol tracker support and instead
    handle ipv4/ipv6 from a common, always builtin inet tracker.

    Step 1: Get rid of all the l3proto->func() calls.

    Start with ctnetlink, then move on to packet-path ones.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Jun, 2018

1 commit


23 May, 2018

1 commit


07 May, 2018

1 commit

  • IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
    of conntrack entries. However, if one wants to compare it with the
    maximum (and detect exhaustion), the only solution is currently to read
    sysctl value.

    This patch add nf_conntrack_max value in netlink message, and simplify
    monitoring for application built on netlink API.

    Signed-off-by: Florent Fourcot
    Signed-off-by: Pablo Neira Ayuso

    Florent Fourcot
     

30 Mar, 2018

2 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for your net-next
    tree. This batch comes with more input sanitization for xtables to
    address bug reports from fuzzers, preparation works to the flowtable
    infrastructure and assorted updates. In no particular order, they are:

    1) Make sure userspace provides a valid standard target verdict, from
    Florian Westphal.

    2) Sanitize error target size, also from Florian.

    3) Validate that last rule in basechain matches underflow/policy since
    userspace assumes this when decoding the ruleset blob that comes
    from the kernel, from Florian.

    4) Consolidate hook entry checks through xt_check_table_hooks(),
    patch from Florian.

    5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
    very large compat offset arrays, so we have a reasonable upper limit
    and fuzzers don't exercise the oom-killer. Patches from Florian.

    6) Several WARN_ON checks on xtables mutex helper, from Florian.

    7) xt_rateest now has a hashtable per net, from Cong Wang.

    8) Consolidate counter allocation in xt_counters_alloc(), from Florian.

    9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
    from Xin Long.

    10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
    Felix Fietkau.

    11) Consolidate code through flow_offload_fill_dir(), also from Felix.

    12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
    to remove a dependency with flowtable and ipv6.ko, from Felix.

    13) Cache mtu size in flow_offload_tuple object, this is safe for
    forwarding as f87c10a8aa1e describes, from Felix.

    14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
    modular infrastructure, from Felix.

    15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
    Ahmed Abdelsalam.

    16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.

    17) Support for counting only to nf_conncount infrastructure, patch
    from Yi-Hung Wei.

    18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
    to nft_ct.

    19) Use boolean as return value from ipt_ah and from IPVS too, patch
    from Gustavo A. R. Silva.

    20) Remove useless parameters in nfnl_acct_overquota() and
    nf_conntrack_broadcast_help(), from Taehee Yoo.

    21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.

    22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.

    23) Fix typo in xt_limit, from Geert Uytterhoeven.

    24) Do no use VLAs in Netfilter code, again from Gustavo.

    25) Use ADD_COUNTER from ebtables, from Taehee Yoo.

    26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.

    27) Use pr_*() and add pr_fmt(), from Arushi Singhal.

    28) Add synproxy support to ctnetlink.

    29) ICMP type and IGMP matching support for ebtables, patches from
    Matthias Schiffer.

    30) Support for the revision infrastructure to ebtables, from
    Bernie Harris.

    31) String match support for ebtables, also from Bernie.

    32) Documentation for the new flowtable infrastructure.

    33) Use generic comparison functions in ebt_stp, from Joe Perches.

    34) Demodularize filter chains in nftables.

    35) Register conntrack hooks in case nftables NAT chain is added.

    36) Merge assignments with return in a couple of spots in the
    Netfilter codebase, also from Arushi.

    37) Document that xtables percpu counters are stored in the same
    memory area, from Ben Hutchings.

    38) Revert mark_source_chains() sanity checks that break existing
    rulesets, from Florian Westphal.

    39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Merge assignment with return statement to directly return the value.

    Signed-off-by: Arushi Singhal
    Signed-off-by: Pablo Neira Ayuso

    Arushi Singhal
     

28 Mar, 2018

1 commit


20 Mar, 2018

1 commit

  • This patch exposes synproxy information per-conntrack. Moreover, send
    sequence adjustment events once server sends us the SYN,ACK packet, so
    we can synchronize the sequence adjustment too for packets going as
    reply from the server, as part of the synproxy logic.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

05 Mar, 2018

1 commit

  • These pernet_operations register and unregister
    two conntrack notifiers, and they seem to be safe
    to be executed in parallel.

    General/not related to async pernet_operations JFI:
    ctnetlink_net_exit_batch() actions are grouped in batch,
    and this could look like there is synchronize_rcu()
    is forgotten. But there is synchronize_rcu() on module
    exit patch (in ctnetlink_exit()), so this batch may
    be reworked as simple .exit method.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai