19 Aug, 2017

1 commit

  • The root4 variable is used only when connlimit extension module has been
    stored by the iptables command. and the roo6 variable is used only when
    connlimit extension module has been stored by the ip6tables command.
    So the root4 and roo6 variable does not be used at the same time.

    Signed-off-by: Taehee Yoo
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     

24 Jul, 2017

1 commit

  • This patch removes duplicate rcu_read_lock().

    1. IPVS part:

    According to Julian Anastasov's mention, contexts of ipvs are described
    at: http://marc.info/?l=netfilter-devel&m=149562884514072&w=2, in summary:

    - packet RX/TX: does not need locks because packets come from hooks.
    - sync msg RX: backup server uses RCU locks while registering new
    connections.
    - ip_vs_ctl.c: configuration get/set, RCU locks needed.
    - xt_ipvs.c: It is a netfilter match, running from hook context.

    As result, rcu_read_lock and rcu_read_unlock can be removed from:

    - ip_vs_core.c: all
    - ip_vs_ctl.c:
    - only from ip_vs_has_real_service
    - ip_vs_ftp.c: all
    - ip_vs_proto_sctp.c: all
    - ip_vs_proto_tcp.c: all
    - ip_vs_proto_udp.c: all
    - ip_vs_xmit.c: all (contains only packet processing)

    2. Netfilter part:

    There are three types of functions that are guaranteed the rcu_read_lock().
    First, as result, functions are only called by nf_hook():

    - nf_conntrack_broadcast_help(), pptp_expectfn(), set_expected_rtp_rtcp().
    - tcpmss_reverse_mtu(), tproxy_laddr4(), tproxy_laddr6().
    - match_lookup_rt6(), check_hlist(), hashlimit_mt_common().
    - xt_osf_match_packet().

    Second, functions that caller already held the rcu_read_lock().
    - destroy_conntrack(), ctnetlink_conntrack_event().
    - ctnl_timeout_find_get(), nfqnl_nf_hook_drop().

    Third, functions that are mixed with type1 and type2.

    These functions are called by nf_hook() also these are called by
    ordinary functions that already held the rcu_read_lock():

    - __ctnetlink_glue_build(), ctnetlink_expect_event().
    - ctnetlink_proto_size().

    Applied files are below:

    - nf_conntrack_broadcast.c, nf_conntrack_core.c, nf_conntrack_netlink.c.
    - nf_conntrack_pptp.c, nf_conntrack_sip.c, nfnetlink_cttimeout.c.
    - nfnetlink_queue.c, xt_TCPMSS.c, xt_TPROXY.c, xt_addrtype.c.
    - xt_connlimit.c, xt_hashlimit.c, xt_osf.c

    Detailed calltrace can be found at:
    http://marc.info/?l=netfilter-devel&m=149667610710350&w=2

    Signed-off-by: Taehee Yoo
    Acked-by: Julian Anastasov
    Signed-off-by: Pablo Neira Ayuso

    Taehee Yoo
     

10 Jan, 2017

1 commit

  • In matches and targets that define a kernel-only tail to their
    xt_match and xt_target data structs, add a field .usersize that
    specifies up to where data is to be shared with userspace.

    Performed a search for comment "Used internally by the kernel" to find
    relevant matches and targets. Manually inspected the structs to derive
    a valid offsetof.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: Pablo Neira Ayuso

    Willem de Bruijn
     

05 Jan, 2017

1 commit


05 Dec, 2016

1 commit

  • currently aliased to try_module_get/_put.
    Will be changed in next patch when we add functions to make use of ->net
    argument to store usercount per l3proto tracker.

    This is needed to avoid registering the conntrack hooks in all netns and
    later only enable connection tracking in those that need conntrack.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

03 Nov, 2016

1 commit


23 Sep, 2016

1 commit


19 Sep, 2015

2 commits


11 Aug, 2015

1 commit

  • This patch replaces the zone id which is pushed down into functions
    with the actual zone object. It's a bigger one-time change, but
    needed for later on extending zones with a direction parameter, and
    thus decoupling this additional information from all call-sites.

    No functional changes in this patch.

    The default zone becomes a global const object, namely nf_ct_zone_dflt
    and will be returned directly in various cases, one being, when there's
    f.e. no zoning support.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     

17 Nov, 2014

1 commit


04 Apr, 2014

2 commits

  • Eric points out that the locks can be global.
    Moreover, both Jesper and Eric note that using only 32 locks increases
    false sharing as only two cache lines are used.

    This increases locks to 256 (16 cache lines assuming 64byte cacheline and
    4 bytes per spinlock).

    Suggested-by: Jesper Dangaard Brouer
    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • cannot use ARRAY_SIZE() if spinlock_t is empty struct.

    Fixes: 1442e7507dd597 ("netfilter: connlimit: use keyed locks")
    Reported-by: kbuild test robot
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

17 Mar, 2014

3 commits

  • With current match design every invocation of the connlimit_match
    function means we have to perform (number_of_conntracks % 256) lookups
    in the conntrack table [ to perform GC/delete stale entries ].
    This is also the reason why ____nf_conntrack_find() in perf top has
    > 20% cpu time per core.

    This patch changes the storage to rbtree which cuts down the number of
    ct objects that need testing.

    When looking up a new tuple, we only test the connections of the host
    objects we visit while searching for the wanted host/network (or
    the leaf we need to insert at).

    The slot count is reduced to 32. Increasing slot count doesn't
    speed up things much because of rbtree nature.

    before patch (50kpps rx, 10kpps tx):
    + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
    + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
    + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw

    after (90kpps, 51kpps tx):
    + 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find
    + 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw
    + 2.36% swapper [xt_connlimit] [k] count_tree

    Obvious disadvantages to previous version are the increase in code
    complexity and the increased memory cost.

    Partially based on Eric Dumazets fq scheduler.

    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • currently returns 1 if they're the same. Make it work like mem/strcmp
    so it can be used as rbtree search function.

    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • connlimit currently suffers from spinlock contention, example for
    4-core system with rps enabled:

    + 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh
    + 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh
    + 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh
    + 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
    + 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
    + 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
    + 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
    + 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw

    May allow parallel lookup/insert/delete if the entry is hashed to
    another slot. With patch:

    + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
    + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
    + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
    + 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock

    Improved rx processing rate from ~35kpps to ~50 kpps.

    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

12 Mar, 2014

4 commits


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

07 Jun, 2012

1 commit


15 Mar, 2011

4 commits


14 Feb, 2011

1 commit


12 Feb, 2011

1 commit

  • The patch below introduces an early termination of the loop that is
    counting matches. It terminates once the counter has exceeded the
    threshold provided by the user. There's no point in continuing the loop
    afterwards and looking at other entries.

    It plays together with the following code further below:

    return (connections > info->limit) ^ info->inverse;

    where connections is the result of the counted connection, which in turn
    is the matches variable in the loop. So once

    -> matches = info->limit + 1
    alias -> matches > info->limit
    alias -> matches > threshold

    we can terminate the loop.

    Signed-off-by: Stefan Berger
    Signed-off-by: Patrick McHardy

    Stefan Berger
     

26 Jan, 2011

1 commit

  • xt_connlimit normally records the "original" tuples in a hashlist
    (such as "1.2.3.4 -> 5.6.7.8"), and looks in this list for iph->daddr
    when counting.

    When the user however uses DNAT in PREROUTING, looking for
    iph->daddr -- which is now 192.168.9.10 -- will not match. Thus in
    daddr mode, we need to record the reverse direction tuple
    ("192.168.9.10 -> 1.2.3.4") instead. In the reverse tuple, the dst
    addr is on the src side, which is convenient, as count_them still uses
    &conn->tuple.src.u3.

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     

20 Jan, 2011

1 commit

  • This adds destination address-based selection. The old "inverse"
    member is overloaded (memory-wise) with a new "flags" variable,
    similar to how J.Park did it with xt_string rev 1. Since revision 0
    userspace only sets flag 0x1, no great changes are made to explicitly
    test for different revisions.

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     

18 Jan, 2011

1 commit


12 May, 2010

3 commits


20 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

3 commits

  • When extended status codes are available, such as ENOMEM on failed
    allocations, or subsequent functions (e.g. nf_ct_get_l3proto), passing
    them up to userspace seems like a good idea compared to just always
    EINVAL.

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     
  • The following semantic patch does part of the transformation:
    //
    @ rule1 @
    struct xt_match ops;
    identifier check;
    @@
    ops.checkentry = check;

    @@
    identifier rule1.check;
    @@
    check(...) { }

    @@
    identifier rule1.check;
    @@
    check(...) { }
    //

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     
  • Restore function signatures from bool to int so that we can report
    memory allocation failures or similar using -ENOMEM rather than
    always having to pass -EINVAL back.

    This semantic patch may not be too precise (checking for functions
    that use xt_mtchk_param rather than functions referenced by
    xt_match.checkentry), but reviewed, it produced the intended result.

    //
    @@
    type bool;
    identifier check, par;
    @@
    -bool check
    +int check
    (struct xt_mtchk_param *par) { ... }
    //

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     

18 Mar, 2010

1 commit