13 May, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Postpone chain policy update to drop after transaction is complete,
    from Florian Westphal.

    2) Add entry to flowtable after confirmation to fix UDP flows with
    packets going in one single direction.

    3) Reference count leak in dst object, from Taehee Yoo.

    4) Check for TTL field in flowtable datapath, from Taehee Yoo.

    5) Fix h323 conntrack helper due to incorrect boundary check,
    from Jakub Jankowski.

    6) Fix incorrect rcu dereference when fetching basechain stats,
    from Florian Westphal.

    7) Missing error check when adding new entries to flowtable,
    from Taehee Yoo.

    8) Use version field in nfnetlink message to honor the nfgen_family
    field, from Kristian Evensen.

    9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
    from Subash Abhinov Kasiviswanathan.

    10) Prevent dying entries from being added to the flowtable,
    from Taehee Yoo.

    11) Don't hit WARN_ON() with malformed blob in ebtables with
    trailing data after last rule, reported by syzbot, patch
    from Florian Westphal.

    12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
    code.

    13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
    Westphal.

    This batch comes with a conflict that can be fixed with this patch:

    diff --cc include/uapi/linux/netfilter/nf_tables.h
    index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
    --- a/include/uapi/linux/netfilter/nf_tables.h
    +++ b/include/uapi/linux/netfilter/nf_tables.h
    @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
    * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
    * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
    * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
    - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
    + * @NFT_CT_ID: conntrack id
    */
    enum nft_ct_keys {
    NFT_CT_STATE,
    @@@ -991,6 -993,8 +992,7 @@@
    NFT_CT_DST_IP,
    NFT_CT_SRC_IP6,
    NFT_CT_DST_IP6,
    - NFT_CT_TIMEOUT,
    + NFT_CT_ID,
    __NFT_CT_MAX
    };
    #define NFT_CT_MAX (__NFT_CT_MAX - 1)

    That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
    I can also solve this conflict here, just let me know.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 May, 2019

7 commits

  • Conntrack entries can be deleted by the masquerade module. In that case,
    flow offload should be deleted too, but GC and data-path of flow offload
    do not check for conntrack status bits, hence flow offload entries will
    be removed only by the timeout.

    Update garbage collector and data-path to check for ct->status. If
    IPS_DYING_BIT is set, garbage collector removes flow offload entries and
    data-path routine ignores them.

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • CONFIG_NF_CONNTRACK_IPV6 has been deprecated so replace it with a check
    for IPV6 instead.

    Use nf_ip6_route6() instead of v6ops->route() and keep the IS_MODULE()
    in nf_ipv6_ops as mentioned by Florian so that direct calls are used
    when IPV6 is builtin and indirect calls are used only when IPV6 is a
    module.

    Fixes: a0ae2562c6c4b2 ("netfilter: conntrack: remove l3proto abstraction")
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Subash Abhinov Kasiviswanathan
     
  • Commit 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter
    on flush") introduced a user-space regression when flushing connection
    track entries. Before this commit, the nfgen_family field was not used
    by the kernel and all entries were removed. Since this commit,
    nfgen_family is used to filter out entries that should not be removed.
    One example a broken tool is conntrack. conntrack always sets
    nfgen_family to AF_INET, so after 59c08c69c278 only IPv4 entries were
    removed with the -F parameter.

    Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the
    regression, and this commit implements his suggestion. nfgenmsg->version
    is so far set to zero, so it is well-suited to be used as a flag for
    selecting old or new flush behavior. If version is 0, nfgen_family is
    ignored and all entries are used. If user-space sets the version to one
    (or any other value than 0), then the new behavior is used. As version
    only can have two valid values, I chose not to add a new
    NFNETLINK_VERSION-constant.

    Fixes: 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
    Reported-by: Nicolas Dichtel
    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Kristian Evensen
    Tested-by: Nicolas Dichtel
    Signed-off-by: Pablo Neira Ayuso

    Kristian Evensen
     
  • Make use of the struct_size() helper instead of an open-coded version
    in order to avoid any potential type mistakes, in particular in the
    context in which this code is being used.

    So, replace code of the following form:

    sizeof(struct xt_hashlimit_htable) + sizeof(struct hlist_head) * size

    with:

    struct_size(hinfo, hash, size)

    This code was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     
  • rhashtable_insert_fast() may return an error value when memory
    allocation fails, but flow_offload_add() does not check for errors.
    This patch just adds missing error checking.

    Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • Following splat gets triggered when nfnetlink monitor is running while
    xtables-nft selftests are running:

    net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
    other info that might help us debug this:

    1 lock held by xtables-nft-mul/27006:
    #0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
    Call Trace:
    nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
    nf_tables_chain_notify+0xf8/0x1a0
    nf_tables_commit+0x165c/0x1740

    nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
    or from the transaction path if a userspace process subscribed to nftables
    notifications.

    In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
    hold transaction mutex so the pointer can be NULLed right after the check.
    Just unconditionally fetch the value, then have the helper return
    immediately if its NULL.

    In the notification case we don't hold the rcu read lock, but updates are
    prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
    aware of this.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Since commit bc7d811ace4a ("netfilter: nf_ct_h323: Convert
    CHECK_BOUND macro to function"), NAT traversal for H.323
    doesn't work, failing to parse H323-UserInformation.
    nf_h323_error_boundary() compares contents of the bitstring,
    not the addresses, preventing valid H.323 packets from being
    conntrack'd.

    This looks like an oversight from when CHECK_BOUND macro was
    converted to a function.

    To fix it, stop dereferencing bs->cur and bs->end.

    Fixes: bc7d811ace4a ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
    Signed-off-by: Jakub Jankowski
    Signed-off-by: Pablo Neira Ayuso

    Jakub Jankowski
     

30 Apr, 2019

12 commits

  • There is a spelling mistake in the module description. Fix this.

    Signed-off-by: Colin Ian King
    Reviewed-by: Mukesh Ojha
    Signed-off-by: Pablo Neira Ayuso

    Colin Ian King
     
  • The 'id' key returns the unique id of the conntrack entry as returned
    by nf_ct_get_id().

    Signed-off-by: Brett Mastbergen
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Brett Mastbergen
     
  • Register amanda, ftp, irc, sip and tftp NAT helpers.

    Signed-off-by: Flavio Leitner
    Signed-off-by: Pablo Neira Ayuso

    Flavio Leitner
     
  • The API allows a conntrack helper to indicate its corresponding
    NAT helper which then can be loaded and reference counted.

    Signed-off-by: Flavio Leitner
    Signed-off-by: Pablo Neira Ayuso

    Flavio Leitner
     
  • Each NAT helper creates a module alias which follows a pattern.
    Use macros for consistency.

    Signed-off-by: Flavio Leitner
    Signed-off-by: Pablo Neira Ayuso

    Flavio Leitner
     
  • We use the zero and one to limit the boolean options setting.
    After this patch we only set 0 or 1 to boolean options for nf
    conntrack sysctl.

    Signed-off-by: Tonghao Zhang
    Signed-off-by: Pablo Neira Ayuso

    Tonghao Zhang
     
  • nf_flow_offload_ip_hook() and nf_flow_offload_ipv6_hook() do not check
    ttl value. So, ttl value overflow may occur.

    Fixes: 97add9f0d66d ("netfilter: flow table support for IPv4")
    Fixes: 0995210753a2 ("netfilter: flow table support for IPv6")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • flow_offload_alloc() calls nf_route() to get a dst_entry. Internally,
    nf_route() calls ip_route_output_key() that allocates a dst_entry and
    holds it. So, a dst_entry should be released by dst_release() if
    nf_route() is successful.

    Otherwise, netns exit routine cannot be finished and the following
    message is printed:

    [ 257.490952] unregister_netdevice: waiting for lo to become free. Usage count = 1

    Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • This is fixing flow offload for UDP traffic where packets only follow
    one single direction.

    The flow_offload_fixup_tcp() mechanism works fine in case that the
    offloaded entry remains in SYN_RECV state, given sequence tracking is
    reset and that conntrack handles syn+ack packets as a retransmission, ie.

    sES + synack => sIG

    for reply traffic.

    Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • When we process a long ruleset of the form

    chain input {
    type filter hook input priority filter; policy drop;
    ...
    }

    Then the base chain gets registered early on, we then continue to
    process/validate the next messages coming in the same transaction.

    Problem is that if the base chain policy is 'drop', it will take effect
    immediately, which causes all traffic to get blocked until the
    transaction completes or is aborted.

    Fix this by deferring the policy until the transaction has been
    processed and all of the rules have been flagged as active.

    Reported-by: Jann Haber
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This file clearly uses modular infrastructure but does not call
    out the inclusion of explicitly. We add that
    include explicitly here, so we can tidy up some header usage
    elsewhere w/o causing build breakage.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Pablo Neira Ayuso

    Paul Gortmaker
     
  • The nf_tables.h header is used in a lot of files, but it turns out
    that there is only one actual user of nft_expr_clone().

    Hence we relocate that function to be with the one consumer of it
    and avoid having to process it with CPP for all the other files.

    This will also enable a reduction in the other headers that the
    nf_tables.h itself has to include just to be stand-alone, hence
    a pending further significant reduction in the CPP content that
    needs to get processed for each netfilter file.

    Note that the explicit "inline" has been dropped as part of this
    relocation. In similar changes to this, I believe Dave has asked
    this be done, so we free up gcc to make the choice of whether to
    inline or not.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Pablo Neira Ayuso

    Paul Gortmaker
     

28 Apr, 2019

4 commits

  • Add options to strictly validate messages and dump messages,
    sometimes perhaps validating dump messages non-strictly may
    be required, so add an option for that as well.

    Since none of this can really be applied to existing commands,
    set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
    {
    .cmd = X,
    + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
    ...
    },
    ...
    };

    For new commands one should just not copy the .validate 'opt-out'
    flags and thus get strict validation.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • After the previous commit, both ipset_nest_start() and ipset_nest_end() are
    just aliases for nla_nest_start() and nla_nest_end() so that there is no
    need to keep them.

    Signed-off-by: Michal Kubecek
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: David S. Miller

    Michal Kubecek
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

26 Apr, 2019

1 commit


22 Apr, 2019

2 commits

  • It doesn't log a packet if sysctl_log_invalid isn't equal to protonum
    OR sysctl_log_invalid isn't equal to IPPROTO_RAW. This sentence is
    always true. I believe we need to replace OR to AND.

    Cc: Florian Westphal
    Fixes: c4f3db1595827 ("netfilter: conntrack: add and use nf_l4proto_log_invalid")
    Signed-off-by: Andrei Vagin
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Andrei Vagin
     
  • setting net.netfilter.nf_conntrack_timestamp=1 breaks xmit with fq
    scheduler. skb->tstamp might be "refreshed" using ktime_get_real(),
    but fq expects CLOCK_MONOTONIC.

    This patch removes all places in netfilter that check/set skb->tstamp:

    1. To fix the bogus "start" time seen with conntrack timestamping for
    outgoing packets, never use skb->tstamp and always use current time.
    2. In nfqueue and nflog, only use skb->tstamp for incoming packets,
    as determined by current hook (prerouting, input, forward).
    3. xt_time has to use system clock as well rather than skb->tstamp.
    We could still use skb->tstamp for prerouting/input/foward, but
    I see no advantage to make this conditional.

    Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
    Cc: Eric Dumazet
    Reported-by: Michal Soltys
    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

15 Apr, 2019

3 commits

  • Sven Auhagen reported that a 2nd ping request will fail if 'fully-random'
    mode is used.

    Reason is that if no proto information is given, min/max are both 0,
    so we set the icmp id to 0 instead of chosing a random value between
    0 and 65535.

    Update test case as well to catch this, without fix this yields:
    [..]
    ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2)
    ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2)

    ... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets
    dropped.

    Fixes: 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple")
    Reported-by: Sven Auhagen
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • I believe that "hook->num" can be up to UINT_MAX. Shifting more than
    31 bits would is undefined in C but in practice it would lead to shift
    wrapping. That would lead to an array overflow in nf_tables_addchain():

    ops->hook = hook.type->hooks[ops->hooknum];

    Fixes: fe19c04ca137 ("netfilter: nf_tables: remove nhooks field from struct nft_af_info")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Pablo Neira Ayuso

    Dan Carpenter
     
  • else, we leak the addresses to userspace via ctnetlink events
    and dumps.

    Compute an ID on demand based on the immutable parts of nf_conn struct.

    Another advantage compared to using an address is that there is no
    immediate re-use of the same ID in case the conntrack entry is freed and
    reallocated again immediately.

    Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID")
    Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Apr, 2019

3 commits

  • We can receive ICMP errors from client or from
    tunneling real server. While the former can be
    scheduled to real server, the latter should
    not be scheduled, they are decapsulated only when
    existing connection is found.

    Fixes: 6044eeffafbe ("ipvs: attempt to schedule icmp packets")
    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Julian Anastasov
     
  • KMSAN started reporting an error when accessing ct->timeout for the
    first time without initialization:

    BUG: KMSAN: uninit-value in __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765
    ...
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:624
    __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
    __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765
    nf_ct_refresh_acct ./include/net/netfilter/nf_conntrack.h:201
    nf_conntrack_udp_packet+0xb44/0x1040 net/netfilter/nf_conntrack_proto_udp.c:122
    nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1605
    nf_conntrack_in+0x1250/0x26c9 net/netfilter/nf_conntrack_core.c:1696
    ...
    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205
    kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159
    kmsan_kmalloc+0xa9/0x130 mm/kmsan/kmsan_hooks.c:173
    kmem_cache_alloc+0x554/0xb10 mm/slub.c:2789
    __nf_conntrack_alloc+0x16f/0x690 net/netfilter/nf_conntrack_core.c:1342
    init_conntrack+0x6cb/0x2490 net/netfilter/nf_conntrack_core.c:1421

    Signed-off-by: Alexander Potapenko
    Fixes: cc16921351d8ba1 ("netfilter: conntrack: avoid same-timeout update")
    Cc: Florian Westphal
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Alexander Potapenko
     
  • Luca Moro says:
    ------
    The issue lies in the filtering of ICMP and ICMPv6 errors that include an
    inner IP datagram.
    For these packets, icmp_error_message() extract the ICMP error and inner
    layer to search of a known state.
    If a state is found the packet is tagged as related (IP_CT_RELATED).

    The problem is that there is no correlation check between the inner and
    outer layer of the packet.
    So one can encapsulate an error with an inner layer matching a known state,
    while its outer layer is directed to a filtered host.
    In this case the whole packet will be tagged as related.
    This has various implications from a rule bypass (if a rule to related
    trafic is allow), to a known state oracle.

    Unfortunately, we could not find a real statement in a RFC on how this case
    should be filtered.
    The closest we found is RFC5927 (Section 4.3) but it is not very clear.

    A possible fix would be to check that the inner IP source is the same than
    the outer destination.

    We believed this kind of attack was not documented yet, so we started to
    write a blog post about it.
    You can find it attached to this mail (sorry for the extract quality).
    It contains more technical details, PoC and discussion about the identified
    behavior.
    We discovered later that
    https://www.gont.com.ar/papers/filtering-of-icmp-error-messages.pdf
    described a similar attack concept in 2004 but without the stateful
    filtering in mind.
    -----

    This implements above suggested fix:
    In icmp(v6) error handler, take outer destination address, then pass
    that into the common function that does the "related" association.

    After obtaining the nf_conn of the matching inner-headers connection,
    check that the destination address of the opposite direction tuple
    is the same as the outer address and only set RELATED if thats the case.

    Reported-by: Luca Moro
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

12 Apr, 2019

4 commits

  • Replace NF_HOOK() based invocation of the netfilter hooks with a private
    copy of nf_hook_slow().

    This copy has one difference: it can return the rx handler value expected
    by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

    This is needed by the next patch to invoke the ebtables
    "broute" table via the standard netfilter hooks rather than the custom
    "br_should_route_hook" indirection that is used now.

    When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
    bridge rx input handler, but there is no way to indicate this via
    NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
    netfilter core or a percpu flag.

    text data bss dec filename
    3369 56 0 3425 net/bridge/br_input.o.before
    3458 40 0 3498 net/bridge/br_input.o.after

    This allows removal of the "br_should_route_hook" in the next patch.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Only reason for having two different register functions was because of
    ipt_MASQUERADE and ip6t_MASQUERADE being two different modules.

    Previous patch merged those into xt_MASQUERADE, so we can merge this too.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • No need to have separate modules for this.
    before:
    text data bss dec filename
    2038 1168 0 3206 net/ipv4/netfilter/ipt_MASQUERADE.ko
    1526 1024 0 2550 net/ipv6/netfilter/ip6t_MASQUERADE.ko
    after:
    text data bss dec filename
    2521 1296 0 3817 net/netfilter/xt_MASQUERADE.ko

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Both are now implemented by nf_nat_masquerade.c, so no need to keep
    different headers.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

09 Apr, 2019

3 commits