01 May, 2017

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. A large bunch of code cleanups, simplify the conntrack extension
    codebase, get rid of the fake conntrack object, speed up netns by
    selective synchronize_net() calls. More specifically, they are:

    1) Check for ct->status bit instead of using nfct_nat() from IPVS and
    Netfilter codebase, patch from Florian Westphal.

    2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

    3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

    4) Introduce nft_is_base_chain() helper function.

    5) Enforce expectation limit from userspace conntrack helper,
    from Gao Feng.

    6) Add nf_ct_remove_expect() helper function, from Gao Feng.

    7) NAT mangle helper function return boolean, from Gao Feng.

    8) ctnetlink_alloc_expect() should only work for conntrack with
    helpers, from Gao Feng.

    9) Add nfnl_msg_type() helper function to nfnetlink to build the
    netlink message type.

    10) Get rid of unnecessary cast on void, from simran singhal.

    11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

    12) Use list_prev_entry() from nf_tables, from simran signhal.

    13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

    14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

    15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

    16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

    17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

    18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

    19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

    20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

    21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

    22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

    23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

    24) Add event mask support to nft_ct, also from Florian.

    25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

    26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

    27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

    28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

    29) Shrink size of nf_conntrack_ecache structure, from Florian.

    30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

    31) Register SYNPROXY hooks on demand, from Florian Westphal.

    32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

    33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

    34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

    35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

    36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

    37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

    38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

    39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

    40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

    41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

    42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Apr, 2017

1 commit


14 Apr, 2017

2 commits

  • Pass the new extended ACK reporting struct to all of the generic
    netlink parsing functions. For now, pass NULL in almost all callers
    (except for some in the core.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Add the base infrastructure and UAPI for netlink extended ACK
    reporting. All "manual" calls to netlink_ack() pass NULL for now and
    thus don't get extended ACK reporting.

    Big thanks goes to Pablo Neira Ayuso for not only bringing up the
    whole topic at netconf (again) but also coming up with the nlattr
    passing trick and various other ideas.

    Signed-off-by: Johannes Berg
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Johannes Berg
     

08 Apr, 2017

1 commit


07 Apr, 2017

2 commits


20 Feb, 2017

2 commits


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

10 Nov, 2016

16 commits


03 Nov, 2016

1 commit


28 Mar, 2016

1 commit

  • This fix adds a new reference counter (ref_netlink) for the struct ip_set.
    The other reference counter (ref) can be swapped out by ip_set_swap and we
    need a separate counter to keep track of references for netlink events
    like dump. Using the same ref counter for dump causes a race condition
    which can be demonstrated by the following script:

    ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \
    counters
    ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \
    counters
    ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \
    counters

    ipset save &

    ipset swap hash_ip3 hash_ip2
    ipset destroy hash_ip3 /* will crash the machine */

    Swap will exchange the values of ref so destroy will see ref = 0 instead of
    ref = 1. With this fix in place swap will not succeed because ipset save
    still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).

    Both delete and swap will error out if ref_netlink != 0 on the set.

    Note: The changes to *_head functions is because previously we would
    increment ref whenever we called these functions, we don't do that
    anymore.

    Reviewed-by: Joshua Hunt
    Signed-off-by: Vishwanath Pai
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Vishwanath Pai
     

09 Mar, 2016

1 commit


25 Feb, 2016

1 commit


13 Jan, 2016

1 commit

  • Jozsef says:
    The correct behaviour is that if we have
    ipset create test1 hash:net,iface
    ipset add test1 0.0.0.0/0,eth0
    iptables -A INPUT -m set --match-set test1 src,src

    then the rule should match for any traffic coming in through eth0.

    This removes the -EINVAL runtime test to make matching work
    in case packet arrived via the specified interface.

    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1297092
    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

29 Dec, 2015

1 commit


07 Nov, 2015

3 commits


24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Oct, 2015

1 commit

  • Commit 00590fdd5be0 introduced RCU locking in list type and in
    doing so introduced a memory allocation in list_set_add, which
    is done in an atomic context, due to the fact that ipset rcu
    list modifications are serialised with a spin lock. The reason
    why we can't use a mutex is that in addition to modifying the
    list with ipset commands, it's also being modified when a
    particular ipset rule timeout expires aka garbage collection.
    This gc is triggered from set_cleanup_entries, which in turn
    is invoked from a timer thus requiring the lock to be bh-safe.

    Concretely the following call chain can lead to "sleeping function
    called in atomic context" splat:
    call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
    And since GFP_KERNEL allows initiating direct reclaim thus
    potentially sleeping in the allocation path.

    To fix the issue change the allocation type to GFP_ATOMIC, to
    correctly reflect that it is occuring in an atomic context.

    Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type")
    Signed-off-by: Nikolay Borisov
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Nikolay Borisov
     

19 Sep, 2015

1 commit


29 Aug, 2015

2 commits

  • In continue to proposed Vinson Lee's post [1], this patch fixes compilation
    issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed
    unions causes compilation error in gcc 4.4.x.

    References

    Visible links
    [1] https://lkml.org/lkml/2015/7/5/74

    Signed-off-by: Elad Raz
    Signed-off-by: Pablo Neira Ayuso

    Elad Raz
     
  • Dave Jones reported that KASan detected out of bounds access in hash:net*
    types:

    [ 23.139532] ==================================================================
    [ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58
    [ 23.152937] Write of size 4 by task ipset/457
    [ 23.159742] =============================================================================
    [ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    [ 23.173641] -----------------------------------------------------------------------------
    [ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456
    [ 23.201836] __slab_alloc.constprop.66+0x554/0x620
    [ 23.208994] __kmalloc+0x2f2/0x360
    [ 23.216105] hash_net_create+0x16a/0x470
    [ 23.223238] ip_set_create+0x3e6/0x740
    [ 23.230343] nfnetlink_rcv_msg+0x599/0x640
    [ 23.237454] netlink_rcv_skb+0x14f/0x190
    [ 23.244533] nfnetlink_rcv+0x3f6/0x790
    [ 23.251579] netlink_unicast+0x272/0x390
    [ 23.258573] netlink_sendmsg+0x5a1/0xa50
    [ 23.265485] SYSC_sendto+0x1da/0x2c0
    [ 23.272364] SyS_sendto+0xe/0x10
    [ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f

    The bug is fixed in the patch and the testsuite is extended in ipset
    to check cidr handling more thoroughly.

    Signed-off-by: Jozsef Kadlecsik

    Jozsef Kadlecsik