24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation or any later at your
    option

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 5 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Richard Fontana
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520075210.769496418@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

18 Jan, 2019

3 commits

  • Its now same as __nf_ct_l4proto_find(), so rename that to
    nf_ct_l4proto_find and use it everywhere.

    It never returns NULL and doesn't need locks or reference counts.

    Before this series:
    302824 net/netfilter/nf_conntrack.ko
    21504 net/netfilter/nf_conntrack_proto_gre.ko

    text data bss dec hex filename
    6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko
    108356 20613 236 129205 1f8b5 nf_conntrack.ko

    After:
    294864 net/netfilter/nf_conntrack.ko
    text data bss dec hex filename
    106979 19557 240 126776 1ef38 nf_conntrack.ko

    so, even with builtin gre, total size got reduced.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • No need to get/put module owner reference, none of these can be removed
    anymore.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This makes the last of the modular l4 trackers 'bool'.

    After this, all infrastructure to handle dynamic l4 protocol registration
    becomes obsolete and can be removed in followup patches.

    Old:
    302824 net/netfilter/nf_conntrack.ko
    21504 net/netfilter/nf_conntrack_proto_gre.ko

    New:
    313728 net/netfilter/nf_conntrack.ko

    Old:
    text data bss dec hex filename
    6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko
    108356 20613 236 129205 1f8b5 nf_conntrack.ko
    New:
    112095 21381 240 133716 20a54 nf_conntrack.ko

    The size increase is only temporary.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

26 Nov, 2018

1 commit

  • syzbot was able to trigger the WARN in cttimeout_default_get() by
    passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use
    same timeout values.

    Furthermore, also fetch GRE timeouts. GRE is a bit more complicated,
    as it still can be a module and its netns_proto_gre struct layout isn't
    visible outside of the gre module. Can't move timeouts around, it
    appears conntrack sysctl unregister assumes net_generic() returns
    nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.

    A followup nf-next patch could make gre tracker be built-in as well
    if needed, its not that large.

    Last, make the WARN() mention the missing protocol value in case
    anything else is missing.

    Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
    Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

03 Nov, 2018

1 commit

  • Otherwise, we hit a NULL pointer deference since handlers always assume
    default timeout policy is passed.

    netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'.
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297

    Fixes: c779e849608a ("netfilter: conntrack: remove get_timeout() indirection")
    Reported-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

16 Oct, 2018

1 commit

  • Fixes gcc '-Wunused-but-set-variable' warning:

    net/netfilter/nfnetlink_cttimeout.c: In function 'cttimeout_default_set':
    net/netfilter/nfnetlink_cttimeout.c:353:8: warning:
    variable 'l3num' set but not used [-Wunused-but-set-variable]

    It not used any more after
    commit dd2934a95701 ("netfilter: conntrack: remove l3->l4 mapping information")

    Signed-off-by: YueHaibing
    Signed-off-by: Pablo Neira Ayuso

    YueHaibing
     

21 Sep, 2018

1 commit

  • l4 protocols are demuxed by l3num, l4num pair.

    However, almost all l4 trackers are l3 agnostic.

    Only exceptions are:
    - gre, icmp (ipv4 only)
    - icmpv6 (ipv6 only)

    This commit gets rid of the l3 mapping, l4 trackers can now be looked up
    by their IPPROTO_XXX value alone, which gets rid of the additional l3
    indirection.

    For icmp, ipcmp6 and gre, add a check on state->pf and
    return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
    this seems more fitting than using the generic tracker.

    Additionally we can kill the 2nd l4proto definitions that were needed
    for v4/v6 split -- they are now the same so we can use single l4proto
    struct for each protocol, rather than two.

    The EXPORT_SYMBOLs can be removed as all these object files are
    part of nf_conntrack with no external references.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Sep, 2018

1 commit


11 Sep, 2018

1 commit

  • Compiler did not catch incorrect typing in the rcu hook assignment.

    % nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10
    % iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp
    dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000

    The CT target bails out with incorrect layer 3 protocol number.

    Fixes: 6c1fd7dc489d ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object")
    Reported-by: Harsha Sharma
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

07 Aug, 2018

2 commits


04 Aug, 2018

1 commit


16 Jul, 2018

2 commits


20 Mar, 2018

1 commit

  • In preparation to enabling -Wvla, remove VLA and replace it
    with dynamic memory allocation.

    >From a security viewpoint, the use of Variable Length Arrays can be
    a vector for stack overflow attacks. Also, in general, as the code
    evolves it is easy to lose track of how big a VLA can get. Thus, we
    can end up having segfaults that are hard to debug.

    Also, fixed as part of the directive to remove all VLAs from
    the kernel: https://lkml.org/lkml/2018/3/7/621

    While at it, remove likely() notation which is not necessary from the
    control plane code.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     

20 Jan, 2018

1 commit

  • Several reasons for this:

    * Several modules maintain internal version numbers, that they print at
    boot/module load time, that are not exposed to userspace, as a
    primitive mechanism to make revision number control from the earlier
    days of Netfilter.

    * IPset shows the protocol version at boot/module load time, instead
    display this via module description, as Jozsef suggested.

    * Remove copyright notice at boot/module load time in two spots, the
    Netfilter codebase is a collective development effort, if we would
    have to display copyrights for each contributor at boot/module load
    time for each extensions we have, we would probably fill up logs with
    lots of useless information - from a technical standpoint.

    So let's be consistent and remove them all.

    Acked-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

25 Aug, 2017

1 commit


02 Aug, 2017

1 commit

  • When a nf_conntrack_l3/4proto parameter is not on the left hand side
    of an assignment, its address is not taken, and it is not passed to a
    function that may modify its fields, then it can be declared as const.

    This change is useful from a documentation point of view, and can
    possibly facilitate making some nf_conntrack_l3/4proto structures const
    subsequently.

    Done with the help of Coccinelle.

    Signed-off-by: Julia Lawall
    Signed-off-by: Pablo Neira Ayuso

    Julia Lawall
     

01 Aug, 2017

1 commit


24 Jul, 2017

1 commit

  • This patch removes duplicate rcu_read_lock().

    1. IPVS part:

    According to Julian Anastasov's mention, contexts of ipvs are described
    at: http://marc.info/?l=netfilter-devel&m=149562884514072&w=2, in summary:

    - packet RX/TX: does not need locks because packets come from hooks.
    - sync msg RX: backup server uses RCU locks while registering new
    connections.
    - ip_vs_ctl.c: configuration get/set, RCU locks needed.
    - xt_ipvs.c: It is a netfilter match, running from hook context.

    As result, rcu_read_lock and rcu_read_unlock can be removed from:

    - ip_vs_core.c: all
    - ip_vs_ctl.c:
    - only from ip_vs_has_real_service
    - ip_vs_ftp.c: all
    - ip_vs_proto_sctp.c: all
    - ip_vs_proto_tcp.c: all
    - ip_vs_proto_udp.c: all
    - ip_vs_xmit.c: all (contains only packet processing)

    2. Netfilter part:

    There are three types of functions that are guaranteed the rcu_read_lock().
    First, as result, functions are only called by nf_hook():

    - nf_conntrack_broadcast_help(), pptp_expectfn(), set_expected_rtp_rtcp().
    - tcpmss_reverse_mtu(), tproxy_laddr4(), tproxy_laddr6().
    - match_lookup_rt6(), check_hlist(), hashlimit_mt_common().
    - xt_osf_match_packet().

    Second, functions that caller already held the rcu_read_lock().
    - destroy_conntrack(), ctnetlink_conntrack_event().
    - ctnl_timeout_find_get(), nfqnl_nf_hook_drop().

    Third, functions that are mixed with type1 and type2.

    These functions are called by nf_hook() also these are called by
    ordinary functions that already held the rcu_read_lock():

    - __ctnetlink_glue_build(), ctnetlink_expect_event().
    - ctnetlink_proto_size().

    Applied files are below:

    - nf_conntrack_broadcast.c, nf_conntrack_core.c, nf_conntrack_netlink.c.
    - nf_conntrack_pptp.c, nf_conntrack_sip.c, nfnetlink_cttimeout.c.
    - nfnetlink_queue.c, xt_TCPMSS.c, xt_TPROXY.c, xt_addrtype.c.
    - xt_connlimit.c, xt_hashlimit.c, xt_osf.c

    Detailed calltrace can be found at:
    http://marc.info/?l=netfilter-devel&m=149667610710350&w=2

    Signed-off-by: Taehee Yoo
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     

20 Jun, 2017

1 commit

  • Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
    subsystem callbacks, so we can work on follow up patches to provide
    finer grain error reporting using the new infrastructure that
    2d4bc93368f5 ("netlink: extended ACK reporting") provides.

    No functional change, just pass down this new object to callbacks.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

29 May, 2017

1 commit


01 May, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. A large bunch of code cleanups, simplify the conntrack extension
    codebase, get rid of the fake conntrack object, speed up netns by
    selective synchronize_net() calls. More specifically, they are:

    1) Check for ct->status bit instead of using nfct_nat() from IPVS and
    Netfilter codebase, patch from Florian Westphal.

    2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

    3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

    4) Introduce nft_is_base_chain() helper function.

    5) Enforce expectation limit from userspace conntrack helper,
    from Gao Feng.

    6) Add nf_ct_remove_expect() helper function, from Gao Feng.

    7) NAT mangle helper function return boolean, from Gao Feng.

    8) ctnetlink_alloc_expect() should only work for conntrack with
    helpers, from Gao Feng.

    9) Add nfnl_msg_type() helper function to nfnetlink to build the
    netlink message type.

    10) Get rid of unnecessary cast on void, from simran singhal.

    11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

    12) Use list_prev_entry() from nf_tables, from simran signhal.

    13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

    14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

    15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

    16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

    17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

    18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

    19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

    20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

    21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

    22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

    23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

    24) Add event mask support to nft_ct, also from Florian.

    25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

    26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

    27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

    28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

    29) Shrink size of nf_conntrack_ecache structure, from Florian.

    30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

    31) Register SYNPROXY hooks on demand, from Florian Westphal.

    32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

    33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

    34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

    35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

    36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

    37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

    38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

    39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

    40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

    41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

    42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

14 Apr, 2017

1 commit


07 Apr, 2017

1 commit


06 Apr, 2017

1 commit


27 Mar, 2017

1 commit

  • Otherwise, another CPU may access the invalid pointer. For example:
    CPU0 CPU1
    - rcu_read_lock();
    - pfunc = _hook_;
    _hook_ = NULL; -
    mod unload -
    - pfunc(); // invalid, panic
    - rcu_read_unlock();

    So we must call synchronize_rcu() to wait the rcu reader to finish.

    Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
    by later nf_conntrack_helper_unregister, but I'm inclined to add a
    explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
    on such obscure assumptions is not a good idea.

    Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
    so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
    remove it too.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

17 Mar, 2017

1 commit

  • refcount_t type and corresponding API (see include/linux/refcount.h)
    should be used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Pablo Neira Ayuso

    Reshetova, Elena
     

25 Aug, 2016

3 commits

  • KASAN reported this bug:
    BUG: KASAN: use-after-free in icmp_packet+0x25/0x50 [nf_conntrack_ipv4] at
    addr ffff880002db08c8
    Read of size 4 by task lt-nf-queue/19041
    Call Trace:
    [] dump_stack+0x63/0x88
    [] kasan_report_error+0x528/0x560
    [] kasan_report+0x58/0x60
    [] ? icmp_packet+0x25/0x50 [nf_conntrack_ipv4]
    [] __asan_load4+0x61/0x80
    [] icmp_packet+0x25/0x50 [nf_conntrack_ipv4]
    [] nf_conntrack_in+0x550/0x980 [nf_conntrack]
    [] ? __nf_conntrack_confirm+0xb10/0xb10 [nf_conntrack]
    [ ... ]

    The main reason is that we missed to unlink the timeout objects in the
    unconfirmed ct lists, so we will access the timeout objects that have
    already been freed.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • We forget to call nf_ct_l4proto_put when replacing the existing
    timeout policy. Acctually, there's no need to get ct l4proto
    before doing replace, so we can move it to a later position.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • cttimeout and acct objects are deleted from the list while traversing
    it, so use list_for_each_entry is unsafe here.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

18 Aug, 2016

1 commit

  • In general, when we want to delete a netns, cttimeout_net_exit will
    be called before ipt_unregister_table, i.e. before ctnl_timeout_put.

    But after call kfree_rcu in cttimeout_net_exit, we will still decrease
    the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
    and will cause a use after free error.

    It is easy to reproduce this problem:
    # while : ; do
    ip netns add xxx
    ip netns exec xxx nfct add timeout testx inet icmp timeout 200
    ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
    ip netns del xxx
    done

    =======================================================================
    BUG kmalloc-96 (Tainted: G B E ): Poison overwritten
    -----------------------------------------------------------------------
    INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
    0x6b
    INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
    age=104 cpu=0 pid=3330
    ___slab_alloc+0x4da/0x540
    __slab_alloc+0x20/0x40
    __kmalloc+0x1c8/0x240
    cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
    nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
    [ ... ]

    So only when the refcnt decreased to 0, we call kfree_rcu to free the
    timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
    avoid race between ctnl_timeout_try_del and ctnl_timeout_put.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

11 Jul, 2016

1 commit

  • Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing
    ctnl_untimeout, and iterate on 3000# bucket.

    Meanwhile, another user try to reduce hash size to 2048, then all nf_conn
    are removed to the new hashtable. When this hash resize operation finished,
    we still try to itreate ct begin from 3000# bucket, find nothing to do and
    just return.

    We may miss unlinking some timeout objects. And later we will end up with
    invalid references to timeout object that are already gone.

    So when we find that hash resize happened, try to unlink timeout objects
    from the 0# bucket again.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

05 May, 2016

1 commit

  • We already include netns address in the hash and compare the netns pointers
    during lookup, so even if namespaces have overlapping addresses entries
    will be spread across the table.

    Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
    64bit system.

    NAT bysrc and expectation hash is still per namespace, those will
    changed too soon.

    Future patch will also make conntrack object slab cache global again.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

01 Feb, 2016

1 commit


20 Jan, 2016

1 commit

  • When we need to lock all buckets in the connection hashtable we'd attempt to
    lock 1024 spinlocks, which is way more preemption levels than supported by
    the kernel. Furthermore, this behavior was hidden by checking if lockdep is
    enabled, and if it was - use only 8 buckets(!).

    Fix this by using a global lock and synchronize all buckets on it when we
    need to lock them all. This is pretty heavyweight, but is only done when we
    need to resize the hashtable, and that doesn't happen often enough (or at all).

    Signed-off-by: Sasha Levin
    Acked-by: Jesper Dangaard Brouer
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Sasha Levin
     

29 Dec, 2015

1 commit