01 May, 2017

2 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. A large bunch of code cleanups, simplify the conntrack extension
    codebase, get rid of the fake conntrack object, speed up netns by
    selective synchronize_net() calls. More specifically, they are:

    1) Check for ct->status bit instead of using nfct_nat() from IPVS and
    Netfilter codebase, patch from Florian Westphal.

    2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

    3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

    4) Introduce nft_is_base_chain() helper function.

    5) Enforce expectation limit from userspace conntrack helper,
    from Gao Feng.

    6) Add nf_ct_remove_expect() helper function, from Gao Feng.

    7) NAT mangle helper function return boolean, from Gao Feng.

    8) ctnetlink_alloc_expect() should only work for conntrack with
    helpers, from Gao Feng.

    9) Add nfnl_msg_type() helper function to nfnetlink to build the
    netlink message type.

    10) Get rid of unnecessary cast on void, from simran singhal.

    11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

    12) Use list_prev_entry() from nf_tables, from simran signhal.

    13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

    14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

    15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

    16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

    17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

    18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

    19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

    20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

    21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

    22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

    23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

    24) Add event mask support to nft_ct, also from Florian.

    25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

    26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

    27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

    28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

    29) Shrink size of nf_conntrack_ecache structure, from Florian.

    30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

    31) Register SYNPROXY hooks on demand, from Florian Westphal.

    32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

    33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

    34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

    35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

    36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

    37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

    38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

    39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

    40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

    41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

    42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
    of 1160 bytes is larger than 1024 bytes

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

26 Apr, 2017

5 commits

  • nowadays the NAT extension only stores the interface index
    (used to purge connections that got masqueraded when interface goes down)
    and pptp nat information.

    Previous patches moved nf_ct_nat_ext_add to those places that need it.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • make sure nat extension gets added if the master conntrack is subject to
    NAT. This will be required once the nat core stops adding it by default.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Currently the nat extension is always attached as soon as nat module is
    loaded. However, most NAT uses do not need the nat extension anymore.

    Prepare to remove the add-nat-by-default by making those places that need
    it attach it if its not present yet.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Current SYNPROXY codes return NF_DROP during normal TCP handshaking,
    it is not friendly to caller. Because the nf_hook_slow would treat
    the NF_DROP as an error, and return -EPERM.
    As a result, it may cause the top caller think it meets one error.

    For example, the following codes are from cfv_rx_poll()
    err = netif_receive_skb(skb);
    if (unlikely(err)) {
    ++cfv->ndev->stats.rx_dropped;
    } else {
    ++cfv->ndev->stats.rx_packets;
    cfv->ndev->stats.rx_bytes += skb_len;
    }
    When SYNPROXY returns NF_DROP, then netif_receive_skb returns -EPERM.
    As a result, the cfv driver would treat it as an error, and increase
    the rx_dropped counter.

    So use NF_STOLEN instead of NF_DROP now because there is no error
    happened indeed, and free the skb directly.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • Defer registration of the synproxy hooks until the first SYNPROXY rule is
    added. Also means we only register hooks in namespaces that need it.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

16 Apr, 2017

1 commit


15 Apr, 2017

2 commits

  • This function is now obsolete and always returns false.
    This change has no effect on generated code.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • resurrect an old patch from Pablo Neira to remove the untracked objects.

    Currently, there are four possible states of an skb wrt. conntrack.

    1. No conntrack attached, ct is NULL.
    2. Normal (kmem cache allocated) ct attached.
    3. a template (kmalloc'd), not in any hash tables at any point in time
    4. the 'untracked' conntrack, a percpu nf_conn object, tagged via
    IPS_UNTRACKED_BIT in ct->status.

    Untracked is supposed to be identical to case 1. It exists only
    so users can check

    -m conntrack --ctstate UNTRACKED vs.
    -m conntrack --ctstate INVALID

    e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is
    supposed to be a no-op.

    Thus currently we need to check
    ct == NULL || nf_ct_is_untracked(ct)

    in a lot of places in order to avoid altering untracked objects.

    The other consequence of the percpu untracked object is that all
    -j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op
    (inc/dec the untracked conntracks refcount).

    This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to
    make the distinction instead.

    The (few) places that care about packet invalid (ct is NULL) vs.
    packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED,
    but all other places can omit the nf_ct_is_untracked() check.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

14 Apr, 2017

1 commit


08 Apr, 2017

1 commit


07 Apr, 2017

2 commits


06 Apr, 2017

1 commit


27 Mar, 2017

2 commits

  • In the commit 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp
    helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
    snmp_helper is never registered. But it still tries to unregister the
    snmp_helper, it could cause the panic.

    Now remove the useless snmp_helper and the unregister call in the
    error handler.

    Fixes: 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp helper")
    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • Otherwise, another CPU may access the invalid pointer. For example:
    CPU0 CPU1
    - rcu_read_lock();
    - pfunc = _hook_;
    _hook_ = NULL; -
    mod unload -
    - pfunc(); // invalid, panic
    - rcu_read_unlock();

    So we must call synchronize_rcu() to wait the rcu reader to finish.

    Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
    by later nf_conntrack_helper_unregister, but I'm inclined to add a
    explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
    on such obscure assumptions is not a good idea.

    Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
    so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
    remove it too.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

24 Mar, 2017

1 commit


17 Mar, 2017

1 commit

  • refcount_t type and corresponding API (see include/linux/refcount.h)
    should be used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Pablo Neira Ayuso

    Reshetova, Elena
     

13 Mar, 2017

2 commits

  • Instead of the actual interface index or name, set destination register
    to just 1 or 0 depending on whether the lookup succeeded or not if
    NFTA_FIB_F_PRESENT was set in userspace.

    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso

    Phil Sutter
     
  • Currently, there are two different methods to store an u16 integer to
    the u32 data register. For example:
    u32 *dest = ®s->data[priv->dreg];
    1. *dest = 0; *(u16 *) dest = val_u16;
    2. *dest = val_u16;

    For method 1, the u16 value will be stored like this, either in
    big-endian or little-endian system:
    0 15 31
    +-+-+-+-+-+-+-+-+-+-+-+-+
    | Value | 0 |
    +-+-+-+-+-+-+-+-+-+-+-+-+

    For method 2, in little-endian system, the u16 value will be the same
    as listed above. But in big-endian system, the u16 value will be stored
    like this:
    0 15 31
    +-+-+-+-+-+-+-+-+-+-+-+-+
    | 0 | Value |
    +-+-+-+-+-+-+-+-+-+-+-+-+

    So later we use "memcmp(®s->data[priv->sreg], data, 2);" to do
    compare in nft_cmp, nft_lookup expr ..., method 2 will get the wrong
    result in big-endian system, as 0~15 bits will always be zero.

    For the similar reason, when loading an u16 value from the u32 data
    register, we should use "*(u16 *) sreg;" instead of "(u16)*sreg;",
    the 2nd method will get the wrong value in the big-endian system.

    So introduce some wrapper functions to store/load an u8 or u16
    integer to/from the u32 data register, and use them in the right
    place.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

09 Mar, 2017

2 commits

  • variable oiph is not used.

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     
  • Andrey reports syzkaller splat caused by

    NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));

    in ipv4 nat. But this assertion (and the comment) are wrong, this function
    does see fragments when IP_NODEFRAG setsockopt is used.

    As conntrack doesn't track packets without complete l4 header, only the
    first fragment is tracked.

    Because applying nat to first packet but not the rest makes no sense this
    also turns off tracking of all fragments.

    Reported-by: Andrey Konovalov
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

07 Mar, 2017

2 commits

  • ret is initialized to zero and if it is set to non-zero in the
    xt_entry_foreach loop then we exit via the out_free label. Hence
    the check for ret being non-zero is redundant and can be removed.

    Detected by CoverityScan, CID#1357132 ("Logically Dead Code")

    Signed-off-by: Colin Ian King
    Signed-off-by: Pablo Neira Ayuso

    Colin Ian King
     
  • Logging output was changed when simple printks without KERN_CONT
    are now emitted on a new line and KERN_CONT is required to continue
    lines so use pr_cont.

    Miscellanea:

    o realign arguments
    o use print_hex_dump instead of a local variant

    Signed-off-by: Joe Perches
    Signed-off-by: Pablo Neira Ayuso

    Joe Perches
     

28 Feb, 2017

1 commit

  • Now that %z is standartised in C99 there is no reason to support %Z.
    Unlike %L it doesn't even make format strings smaller.

    Use BUILD_BUG_ON in a couple ATM drivers.

    In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
    is in my opinion is quite an achievement. Hopefully this patch inspires
    someone else to trim vsprintf.c more.

    Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

04 Feb, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree, they are:

    1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
    sk_buff so we only access one single cacheline in the conntrack
    hotpath. Patchset from Florian Westphal.

    2) Don't leak pointer to internal structures when exporting x_tables
    ruleset back to userspace, from Willem DeBruijn. This includes new
    helper functions to copy data to userspace such as xt_data_to_user()
    as well as conversions of our ip_tables, ip6_tables and arp_tables
    clients to use it. Not surprinsingly, ebtables requires an ad-hoc
    update. There is also a new field in x_tables extensions to indicate
    the amount of bytes that we copy to userspace.

    3) Add nf_log_all_netns sysctl: This new knob allows you to enable
    logging via nf_log infrastructure for all existing netnamespaces.
    Given the effort to provide pernet syslog has been discontinued,
    let's provide a way to restore logging using netfilter kernel logging
    facilities in trusted environments. Patch from Michal Kubecek.

    4) Validate SCTP checksum from conntrack helper, from Davide Caratti.

    5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
    a copy&paste from the original helper, from Florian Westphal.

    6) Reset netfilter state when duplicating packets, also from Florian.

    7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
    nft_meta, from Liping Zhang.

    8) Add missing code to deal with loopback packets from nft_meta when
    used by the netdev family, also from Liping.

    9) Several cleanups on nf_tables, one to remove unnecessary check from
    the netlink control plane path to add table, set and stateful objects
    and code consolidation when unregister chain hooks, from Gao Feng.

    10) Fix harmless reference counter underflow in IPVS that, however,
    results in problems with the introduction of the new refcount_t
    type, from David Windsor.

    11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
    from Davide Caratti.

    12) Missing documentation on nf_tables uapi header, from Liping Zhang.

    13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Feb, 2017

5 commits

  • Commit 69b34fb996b2 ("netfilter: xt_LOG: add net namespace support for
    xt_LOG") disabled logging packets using the LOG target from non-init
    namespaces. The motivation was to prevent containers from flooding
    kernel log of the host. The plan was to keep it that way until syslog
    namespace implementation allows containers to log in a safe way.

    However, the work on syslog namespace seems to have hit a dead end
    somewhere in 2013 and there are users who want to use xt_LOG in all
    network namespaces. This patch allows to do so by setting

    /proc/sys/net/netfilter/nf_log_all_netns

    to a nonzero value. This sysctl is only accessible from init_net so that
    one cannot switch the behaviour from inside a container.

    Signed-off-by: Michal Kubecek
    Signed-off-by: Pablo Neira Ayuso

    Michal Kubeček
     
  • Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
    This avoids changing code in followup patch that merges skb->nfct and
    skb->nfctinfo into skb->_nfct.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Followup patch renames skb->nfct and changes its type so add a helper to
    avoid intrusive rename change later.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • We should also toss nf_bridge_info, if any -- packet is leaving via
    ip_local_out, also, this skb isn't bridged -- it is a locally generated
    copy. Also this avoids the need to touch this later when skb->nfct is
    replaced with 'unsigned long _nfct' in followup patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • It is never accessed for reading and the only places that write to it
    are the icmp(6) handlers, which also set skb->nfct (and skb->nfctinfo).

    The conntrack core specifically checks for attached skb->nfct after
    ->error() invocation and returns early in this case.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

19 Jan, 2017

1 commit

  • We can't access c->pde if CONFIG_PROC_FS is disabled:

    net/ipv4/netfilter/ipt_CLUSTERIP.c: In function 'clusterip_config_find_get':
    net/ipv4/netfilter/ipt_CLUSTERIP.c:147:9: error: 'struct clusterip_config' has no member named 'pde'

    This moves the check inside of another #ifdef.

    Fixes: 6c5d5cfbe3c5 ("netfilter: ipt_CLUSTERIP: check duplicate config when initializing")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Pablo Neira Ayuso

    Arnd Bergmann
     

16 Jan, 2017

1 commit

  • Currently, we check the existing rtable in PREROUTING hook, if RTCF_LOCAL
    is set, we assume that the packet is loopback.

    But this assumption is incorrect, for example, a packet encapsulated
    in ipsec transport mode was received and routed to local, after
    decapsulation, it would be delivered to local again, and the rtable
    was not dropped, so RTCF_LOCAL check would trigger. But actually, the
    packet was not loopback.

    So for these normal loopback packets, we can check whether the in device
    is IFF_LOOPBACK or not. For these locally generated broadcast/multicast,
    we can check whether the skb->pkt_type is PACKET_LOOPBACK or not.

    Finally, there's a subtle difference between nft fib expr and xtables
    rpfilter extension, user can add the following nft rule to do strict
    rpfilter check:
    # nft add rule x y meta iif eth0 fib saddr . iif oif != eth0 drop

    So when the packet is loopback, it's better to store the in device
    instead of the LOOPBACK_IFINDEX, otherwise, after adding the above
    nft rule, locally generated broad/multicast packets will be dropped
    incorrectly.

    Fixes: f83a7ea2075c ("netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too")
    Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     

10 Jan, 2017

4 commits


06 Jan, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains accumulated Netfilter fixes for your
    net tree:

    1) Ensure quota dump and reset happens iff we can deliver numbers to
    userspace.

    2) Silence splat on incorrect use of smp_processor_id() from nft_queue.

    3) Fix an out-of-bound access reported by KASAN in
    nf_tables_rule_destroy(), patch from Florian Westphal.

    4) Fix layer 4 checksum mangling in the nf_tables payload expression
    with IPv6.

    5) Fix a race in the CLUSTERIP target from control plane path when two
    threads run to add a new configuration object. Serialize invocations
    of clusterip_config_init() using spin_lock. From Xin Long.

    6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with
    the br_nf_pre_routing_finish() hook. From Artur Molchanov.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Dec, 2016

1 commit