22 Nov, 2011

1 commit

  • This patch fixes an oops that can be triggered following this recipe:

    0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
    1) container is started.
    2) connect to it via lxc-console.
    3) generate some traffic with the container to create some conntrack
    entries in its table.
    4) stop the container: you hit one oops because the conntrack table
    cleanup tries to report the destroy event to user-space but the
    per-netns nfnetlink socket has already gone (as the nfnetlink
    socket is per-netns but event callback registration is global).

    To fix this situation, we make the ctnl_notifier per-netns so the
    callback is registered/unregistered if the container is
    created/destroyed.

    Alex Bligh and Alexey Dobriyan originally proposed one small patch to
    check if the nfnetlink socket is gone in nfnetlink_has_listeners,
    but this is a very visited path for events, thus, it may reduce
    performance and it looks a bit hackish to check for the nfnetlink
    socket only to workaround this situation. As a result, I decided
    to follow the bigger path choice, which seems to look nicer to me.

    Cc: Alexey Dobriyan
    Reported-by: Alex Bligh
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

05 Feb, 2011

1 commit


01 Feb, 2011

1 commit

  • For the following rule:

    iptables -I PREROUTING -t raw -j CT --ctevents assured

    The event delivered looks like the following:

    [UPDATE] tcp 6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

    Note that the TCP protocol state is not included. For that reason
    the CT event filtering is not very useful for conntrackd.

    To resolve this issue, instead of conditionally setting the CT events
    bits based on the ctmask, we always set them and perform the filtering
    in the late stage, just before the delivery.

    Thus, the event delivered looks like the following:

    [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

16 Nov, 2010

1 commit


15 Nov, 2010

1 commit


03 Feb, 2010

2 commits

  • Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
    and use them to filter events. Their default value is "all events" when the
    event sysctl is on and "no events" when it is off. A following patch will add
    specific initializations. Expectation events depend on the ecache struct of
    their master conntrack.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated
    when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is
    generated when the IPS_ASSURED bit is set.

    In combination with a following patch to support selective event delivery,
    this can be used for "sparse" conntrack replication: start replicating the
    conntrack entry after it reached the ASSURED state and that way it's SYN-flood
    resistant.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Jun, 2009

2 commits

  • This patch improves ctnetlink event reliability if one broadcast
    listener has set the NETLINK_BROADCAST_ERROR socket option.

    The logic is the following: if an event delivery fails, we keep
    the undelivered events in the missed event cache. Once the next
    packet arrives, we add the new events (if any) to the missed
    events in the cache and we try a new delivery, and so on. Thus,
    if ctnetlink fails to deliver an event, we try to deliver them
    once we see a new packet. Therefore, we may lose state
    transitions but the userspace process gets in sync at some point.

    At worst case, if no events were delivered to userspace, we make
    sure that destroy events are successfully delivered. Basically,
    if ctnetlink fails to deliver the destroy event, we remove the
    conntrack entry from the hashes and we insert them in the dying
    list, which contains inactive entries. Then, the conntrack timer
    is added with an extra grace timeout of random32() % 15 seconds
    to trigger the event again (this grace timeout is tunable via
    /proc). The use of a limited random timeout value allows
    distributing the "destroy" resends, thus, avoiding accumulating
    lots "destroy" events at the same time. Event delivery may
    re-order but we can identify them by means of the tuple plus
    the conntrack ID.

    The maximum number of conntrack entries (active or inactive) is
    still handled by nf_conntrack_max. Thus, we may start dropping
    packets at some point if we accumulate a lot of inactive conntrack
    entries that did not successfully report the destroy event to
    userspace.

    During my stress tests consisting of setting a very small buffer
    of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
    flag, and generating lots of very small connections, I noticed
    very few destroy entries on the fly waiting to be resend.

    A simple way to test this patch consist of creating a lot of
    entries, set a very small Netlink buffer in conntrackd (+ a patch
    which is not in the git tree to set the BROADCAST_ERROR flag)
    and invoke `conntrack -F'.

    For expectations, no changes are introduced in this patch.
    Currently, event delivery is only done for new expectations (no
    events from expectation expiration, removal and confirmation).
    In that case, they need a per-expectation event cache to implement
    the same idea that is exposed in this patch.

    This patch can be useful to provide reliable flow-accouting. We
    still have to add a new conntrack extension to store the creation
    and destroy time.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • This patch reworks the per-cpu event caching to use the conntrack
    extension infrastructure.

    The main drawback is that we consume more memory per conntrack
    if event delivery is enabled. This patch is required by the
    reliable event delivery that follows to this patch.

    BTW, this patch allows you to enable/disable event delivery via
    /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
    you can still disable event caching as compilation option.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

03 Jun, 2009

3 commits

  • This patch removes the notify chain infrastructure and replace it
    by a simple function pointer. This issue has been mentioned in the
    mailing list several times: the use of the notify chain adds
    too much overhead for something that is only used by ctnetlink.

    This patch also changes nfnetlink_send(). It seems that gfp_any()
    returns GFP_KERNEL for user-context request, like those via
    ctnetlink, inside the RCU read-side section which is not valid.
    Using GFP_KERNEL is also evil since netlink may schedule(),
    this leads to "scheduling while atomic" bug reports.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch simplifies the conntrack event caching system by removing
    several events:

    * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
    since the have no clients.
    * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
    days.
    * IPCT_REFRESH which is not of any use since we always include the
    timeout in the messages.

    After this patch, the existing events are:

    * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
    addition and deletion of entries.
    * IPCT_STATUS, that notes that the status bits have changes,
    eg. IPS_SEEN_REPLY and IPS_ASSURED.
    * IPCT_PROTOINFO, that reports that internal protocol information has
    changed, eg. the TCP, DCCP and SCTP protocol state.
    * IPCT_HELPER, that a helper has been assigned or unassigned to this
    entry.
    * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
    covers the case when a mark is set to zero.
    * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
    adjustment.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch moves the event flags from linux/netfilter/nf_conntrack_common.h
    to net/netfilter/nf_conntrack_ecache.h. This flags are not of any use
    from userspace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

18 Nov, 2008

1 commit

  • As for now, the creation and update of conntracks via ctnetlink do not
    propagate an event to userspace. This can result in inconsistent situations
    if several userspace processes modify the connection tracking table by means
    of ctnetlink at the same time. Specifically, using the conntrack command
    line tool and conntrackd at the same time can trigger unconsistencies.

    This patch also modifies the event cache infrastructure to pass the
    process PID and the ECHO flag to nfnetlink_send() to report back
    to userspace if the process that triggered the change needs so.
    Based on a suggestion from Patrick McHardy.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

12 Oct, 2008

1 commit

  • The dummy version of 'nf_conntrack_event_cache()' (used when the
    NF_CONNTRACK_EVENTS config option is not enabled) had not been updated
    when the calling convention changed.

    This was introduced by commit a71996fccce4b2086a26036aa3c915365ca36926
    ("netfilter: netns nf_conntrack: pass conntrack to
    nf_conntrack_event_cache() not skb")

    Tssk.

    Cc: Alexey Dobriyan
    Cc: Patrick McHardy
    Cc: David Miller
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Oct, 2008

1 commit


08 Oct, 2008

2 commits


11 Jul, 2007

1 commit


26 Apr, 2007

1 commit


03 Dec, 2006

1 commit