19 Apr, 2013

1 commit

  • Add copyright statements to all netfilter files which have had significant
    changes done by myself in the past.

    Some notes:

    - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
    Core Team when it got split out of nf_conntrack_core.c. The copyrights
    even state a date which lies six years before it was written. It was
    written in 2005 by Harald and myself.

    - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
    statements. I've added the copyright statement from net/netfilter/core.c,
    where this code originated

    - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
    it to give the wrong impression

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

23 Jan, 2013

1 commit


19 Nov, 2012

1 commit

  • In preparation for supporting the creation of network namespaces
    by unprivileged users, modify all of the per net sysctl exports
    and refuse to allow them to unprivileged users.

    This makes it safe for unprivileged users in general to access
    per net sysctls, and allows sysctls to be exported to unprivileged
    users on an individual basis as they are deemed safe.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

09 May, 2012

1 commit


21 Apr, 2012

1 commit


08 Mar, 2012

2 commits


13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Nov, 2011

1 commit

  • This patch fixes an oops that can be triggered following this recipe:

    0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
    1) container is started.
    2) connect to it via lxc-console.
    3) generate some traffic with the container to create some conntrack
    entries in its table.
    4) stop the container: you hit one oops because the conntrack table
    cleanup tries to report the destroy event to user-space but the
    per-netns nfnetlink socket has already gone (as the nfnetlink
    socket is per-netns but event callback registration is global).

    To fix this situation, we make the ctnl_notifier per-netns so the
    callback is registered/unregistered if the container is
    created/destroyed.

    Alex Bligh and Alexey Dobriyan originally proposed one small patch to
    check if the nfnetlink socket is gone in nfnetlink_has_listeners,
    but this is a very visited path for events, thus, it may reduce
    performance and it looks a bit hackish to check for the nfnetlink
    socket only to workaround this situation. As a result, I decided
    to follow the bigger path choice, which seems to look nicer to me.

    Cc: Alexey Dobriyan
    Reported-by: Alex Bligh
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

01 Nov, 2011

1 commit


02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

01 Feb, 2011

1 commit

  • For the following rule:

    iptables -I PREROUTING -t raw -j CT --ctevents assured

    The event delivered looks like the following:

    [UPDATE] tcp 6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

    Note that the TCP protocol state is not included. For that reason
    the CT event filtering is not very useful for conntrackd.

    To resolve this issue, instead of conditionally setting the CT events
    bits based on the ctmask, we always set them and perform the filtering
    in the late stage, just before the delivery.

    Thus, the event delivered looks like the following:

    [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

20 Aug, 2010

1 commit


11 May, 2010

1 commit


20 Apr, 2010

1 commit


09 Apr, 2010

1 commit

  • The CONFIG_PROVE_RCU option discovered a few invalid uses of
    rcu_dereference() in netfilter. In all these cases, the code code
    intends to check whether a pointer is already assigned when
    performing registration or whether the assigned pointer matches
    when performing unregistration. The entire registration/
    unregistration is protected by a mutex, so we don't need the
    rcu_dereference() calls.

    Reported-by: Valdis Kletnieks
    Tested-by: Valdis Kletnieks
    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

12 Nov, 2009

1 commit

  • Now that sys_sysctl is a compatiblity wrapper around /proc/sys
    all sysctl strategy routines, and all ctl_name and strategy
    entries in the sysctl tables are unused, and can be
    revmoed.

    In addition neigh_sysctl_register has been modified to no longer
    take a strategy argument and it's callers have been modified not
    to pass one.

    Cc: "David Miller"
    Cc: Hideaki YOSHIFUJI
    Cc: netdev@vger.kernel.org
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

13 Jun, 2009

2 commits

  • This patch improves ctnetlink event reliability if one broadcast
    listener has set the NETLINK_BROADCAST_ERROR socket option.

    The logic is the following: if an event delivery fails, we keep
    the undelivered events in the missed event cache. Once the next
    packet arrives, we add the new events (if any) to the missed
    events in the cache and we try a new delivery, and so on. Thus,
    if ctnetlink fails to deliver an event, we try to deliver them
    once we see a new packet. Therefore, we may lose state
    transitions but the userspace process gets in sync at some point.

    At worst case, if no events were delivered to userspace, we make
    sure that destroy events are successfully delivered. Basically,
    if ctnetlink fails to deliver the destroy event, we remove the
    conntrack entry from the hashes and we insert them in the dying
    list, which contains inactive entries. Then, the conntrack timer
    is added with an extra grace timeout of random32() % 15 seconds
    to trigger the event again (this grace timeout is tunable via
    /proc). The use of a limited random timeout value allows
    distributing the "destroy" resends, thus, avoiding accumulating
    lots "destroy" events at the same time. Event delivery may
    re-order but we can identify them by means of the tuple plus
    the conntrack ID.

    The maximum number of conntrack entries (active or inactive) is
    still handled by nf_conntrack_max. Thus, we may start dropping
    packets at some point if we accumulate a lot of inactive conntrack
    entries that did not successfully report the destroy event to
    userspace.

    During my stress tests consisting of setting a very small buffer
    of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
    flag, and generating lots of very small connections, I noticed
    very few destroy entries on the fly waiting to be resend.

    A simple way to test this patch consist of creating a lot of
    entries, set a very small Netlink buffer in conntrackd (+ a patch
    which is not in the git tree to set the BROADCAST_ERROR flag)
    and invoke `conntrack -F'.

    For expectations, no changes are introduced in this patch.
    Currently, event delivery is only done for new expectations (no
    events from expectation expiration, removal and confirmation).
    In that case, they need a per-expectation event cache to implement
    the same idea that is exposed in this patch.

    This patch can be useful to provide reliable flow-accouting. We
    still have to add a new conntrack extension to store the creation
    and destroy time.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • This patch reworks the per-cpu event caching to use the conntrack
    extension infrastructure.

    The main drawback is that we consume more memory per conntrack
    if event delivery is enabled. This patch is required by the
    reliable event delivery that follows to this patch.

    BTW, this patch allows you to enable/disable event delivery via
    /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
    you can still disable event caching as compilation option.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

03 Jun, 2009

1 commit

  • This patch removes the notify chain infrastructure and replace it
    by a simple function pointer. This issue has been mentioned in the
    mailing list several times: the use of the notify chain adds
    too much overhead for something that is only used by ctnetlink.

    This patch also changes nfnetlink_send(). It seems that gfp_any()
    returns GFP_KERNEL for user-context request, like those via
    ctnetlink, inside the RCU read-side section which is not valid.
    Using GFP_KERNEL is also evil since netlink may schedule(),
    this leads to "scheduling while atomic" bug reports.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

18 Nov, 2008

1 commit

  • As for now, the creation and update of conntracks via ctnetlink do not
    propagate an event to userspace. This can result in inconsistent situations
    if several userspace processes modify the connection tracking table by means
    of ctnetlink at the same time. Specifically, using the conntrack command
    line tool and conntrackd at the same time can trigger unconsistencies.

    This patch also modifies the event cache infrastructure to pass the
    process PID and the ECHO flag to nfnetlink_send() to report back
    to userspace if the process that triggered the change needs so.
    Based on a suggestion from Patrick McHardy.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

08 Oct, 2008

1 commit

  • Heh, last minute proof-reading of this patch made me think,
    that this is actually unneeded, simply because "ct" pointers will be
    different for different conntracks in different netns, just like they
    are different in one netns.

    Not so sure anymore.

    [Patrick: pointers will be different, flushing can only be done while
    inactive though and thus it needs to be per netns]

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Patrick McHardy

    Alexey Dobriyan
     

11 Jul, 2007

1 commit


26 Apr, 2007

1 commit


03 Dec, 2006

4 commits