19 Feb, 2010

1 commit

  • As reported by Randy Dunlap , compilation
    of nf_defrag_ipv4 fails with:

    include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
    include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct'
    include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put'
    include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct'
    net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct'
    net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct'

    net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a
    few #ifdefs. Long term the header file should be fixed to be usable
    even with NF_CONNTRACK=n.

    Tested-by: Randy Dunlap
    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

16 Feb, 2010

2 commits


12 Feb, 2010

1 commit


11 Feb, 2010

3 commits


04 Feb, 2010

1 commit

  • Add a new target for the raw table, which can be used to specify conntrack
    parameters for specific connections, f.i. the conntrack helper.

    The target attaches a "template" connection tracking entry to the skb, which
    is used by the conntrack core when initializing a new conntrack.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

03 Feb, 2010

4 commits

  • Support initializing selected parameters of new conntrack entries from a
    "conntrack template", which is a specially marked conntrack entry attached
    to the skb.

    Currently the helper and the event delivery masks can be initialized this
    way.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
    and use them to filter events. Their default value is "all events" when the
    event sysctl is on and "no events" when it is off. A following patch will add
    specific initializations. Expectation events depend on the ecache struct of
    their master conntrack.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated
    when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is
    generated when the IPS_ASSURED bit is set.

    In combination with a following patch to support selective event delivery,
    this can be used for "sparse" conntrack replication: start replicating the
    conntrack entry after it reached the ASSURED state and that way it's SYN-flood
    resistant.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Make sure not to assign a helper for a different network or transport
    layer protocol to a connection.

    Additionally change expectation deletion by helper to compare the name
    directly - there might be multiple helper registrations using the same
    name, currently one of them is chosen in an unpredictable manner and
    only those expectations are removed.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

17 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
    net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2
    ixgbe: allow tx of pre-formatted vlan tagged packets
    ixgbe: Fix 82598 premature copper PHY link indicatation
    ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters
    bcm63xx_enet: fix compilation failure after get_stats_count removal
    packet: dont call sleeping functions while holding rcu_read_lock()
    tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
    ipvs: zero usvc and udest
    netfilter: fix crashes in bridge netfilter caused by fragment jumps
    ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
    sky2: leave PCI config space writeable
    sky2: print Optima chip name
    x25: Update maintainer.
    ipvs: fix synchronization on connection close
    netfilter: xtables: document minimal required version
    drivers/net/bonding/: : use pr_fmt
    can: CAN_MCP251X should depend on HAS_DMA
    drivers/net/usb: Correct code taking the size of a pointer
    drivers/net/cpmac.c: Correct code taking the size of a pointer
    drivers/net/sfc: Correct code taking the size of a pointer
    ...

    Linus Torvalds
     

15 Dec, 2009

2 commits

  • Currently the same reassembly queue might be used for packets reassembled
    by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT),
    as well as local delivery. This can cause "packet jumps" when the fragment
    completing a reassembled packet is queued from a different position in the
    stack than the previous ones.

    Add a "user" identifier to the reassembly queue key to seperate the queues
    of each caller, similar to what we do for IPv4.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

06 Nov, 2009

2 commits

  • Conflicts:
    drivers/net/usb/cdc_ether.c

    All CDC ethernet devices of type USB_CLASS_COMM need to use
    '&mbm_info'.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
    over NAT. The "cause" of the problem was a fix of unacknowledged data
    detection with NAT (commit a3a9f79e361e864f0e9d75ebe2a0cb43d17c4272).
    However, actually, that fix uncovered a long standing bug in TCP conntrack:
    when NAT was enabled, we simply updated the max of the right edge of
    the segments we have seen (td_end), by the offset NAT produced with
    changing IP/port in the data. However, we did not update the other parameter
    (td_maxend) which is affected by the NAT offset. Thus that could drift
    away from the correct value and thus resulted breaking active FTP.

    The patch below fixes the issue by *not* updating the conntrack parameters
    from NAT, but instead taking into account the NAT offsets in conntrack in a
    consistent way. (Updating from NAT would be more harder and expensive because
    it'd need to re-calculate parameters we already calculated in conntrack.)

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jozsef Kadlecsik
     

04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Oct, 2009

1 commit


11 Sep, 2009

1 commit


25 Aug, 2009

1 commit


18 Aug, 2009

1 commit

  • In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
    for better SMP performance" the definition of struct gnet_stats_basic
    changed incompatibly, as copies of this struct are shipped to
    userland via netlink.

    Restoring old behavior is not welcome, for performance reason.

    Fix is to use a private structure for kernel, and
    teach gnet_stats_copy_basic() to convert from kernel to user land,
    using legacy structure (struct gnet_stats_basic)

    Based on a report and initial patch from Michael Spang.

    Reported-by: Michael Spang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Jun, 2009

1 commit

  • When NAT helpers change the TCP packet size, the highest seen sequence
    number needs to be corrected. This is currently only done upwards, when
    the packet size is reduced the sequence number is unchanged. This causes
    TCP conntrack to falsely detect unacknowledged data and decrease the
    timeout.

    Fix by updating the highest seen sequence number in both directions after
    packet mangling.

    Tested-by: Krzysztof Piotr Oledzki
    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

13 Jun, 2009

3 commits

  • This patch improves ctnetlink event reliability if one broadcast
    listener has set the NETLINK_BROADCAST_ERROR socket option.

    The logic is the following: if an event delivery fails, we keep
    the undelivered events in the missed event cache. Once the next
    packet arrives, we add the new events (if any) to the missed
    events in the cache and we try a new delivery, and so on. Thus,
    if ctnetlink fails to deliver an event, we try to deliver them
    once we see a new packet. Therefore, we may lose state
    transitions but the userspace process gets in sync at some point.

    At worst case, if no events were delivered to userspace, we make
    sure that destroy events are successfully delivered. Basically,
    if ctnetlink fails to deliver the destroy event, we remove the
    conntrack entry from the hashes and we insert them in the dying
    list, which contains inactive entries. Then, the conntrack timer
    is added with an extra grace timeout of random32() % 15 seconds
    to trigger the event again (this grace timeout is tunable via
    /proc). The use of a limited random timeout value allows
    distributing the "destroy" resends, thus, avoiding accumulating
    lots "destroy" events at the same time. Event delivery may
    re-order but we can identify them by means of the tuple plus
    the conntrack ID.

    The maximum number of conntrack entries (active or inactive) is
    still handled by nf_conntrack_max. Thus, we may start dropping
    packets at some point if we accumulate a lot of inactive conntrack
    entries that did not successfully report the destroy event to
    userspace.

    During my stress tests consisting of setting a very small buffer
    of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
    flag, and generating lots of very small connections, I noticed
    very few destroy entries on the fly waiting to be resend.

    A simple way to test this patch consist of creating a lot of
    entries, set a very small Netlink buffer in conntrackd (+ a patch
    which is not in the git tree to set the BROADCAST_ERROR flag)
    and invoke `conntrack -F'.

    For expectations, no changes are introduced in this patch.
    Currently, event delivery is only done for new expectations (no
    events from expectation expiration, removal and confirmation).
    In that case, they need a per-expectation event cache to implement
    the same idea that is exposed in this patch.

    This patch can be useful to provide reliable flow-accouting. We
    still have to add a new conntrack extension to store the creation
    and destroy time.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • This patch moves the helper destruction to a function that lives
    in nf_conntrack_helper.c. This new function is used in the patch
    to add ctnetlink reliable event delivery.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • This patch reworks the per-cpu event caching to use the conntrack
    extension infrastructure.

    The main drawback is that we consume more memory per conntrack
    if event delivery is enabled. This patch is required by the
    reliable event delivery that follows to this patch.

    BTW, this patch allows you to enable/disable event delivery via
    /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
    you can still disable event caching as compilation option.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

10 Jun, 2009

1 commit


08 Jun, 2009

1 commit

  • Current conntrack code kills the ICMP conntrack entry as soon as
    the first reply is received. This is incorrect, as we then see only
    the first ICMP echo reply out of several possible duplicates as
    ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
    increases the conntrackd traffic on H-A firewalls.

    Make all the ICMP conntrack entries (including the replied ones)
    last for the default of nf_conntrack_icmp{,v6}_timeout seconds.

    Signed-off-by: Jan "Yenya" Kasprzak
    Signed-off-by: Patrick McHardy

    Jan Kasprzak
     

03 Jun, 2009

5 commits

  • This patch removes the notify chain infrastructure and replace it
    by a simple function pointer. This issue has been mentioned in the
    mailing list several times: the use of the notify chain adds
    too much overhead for something that is only used by ctnetlink.

    This patch also changes nfnetlink_send(). It seems that gfp_any()
    returns GFP_KERNEL for user-context request, like those via
    ctnetlink, inside the RCU read-side section which is not valid.
    Using GFP_KERNEL is also evil since netlink may schedule(),
    this leads to "scheduling while atomic" bug reports.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch simplifies the conntrack event caching system by removing
    several events:

    * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
    since the have no clients.
    * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
    days.
    * IPCT_REFRESH which is not of any use since we always include the
    timeout in the messages.

    After this patch, the existing events are:

    * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
    addition and deletion of entries.
    * IPCT_STATUS, that notes that the status bits have changes,
    eg. IPS_SEEN_REPLY and IPS_ASSURED.
    * IPCT_PROTOINFO, that reports that internal protocol information has
    changed, eg. the TCP, DCCP and SCTP protocol state.
    * IPCT_HELPER, that a helper has been assigned or unassigned to this
    entry.
    * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
    covers the case when a mark is set to zero.
    * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
    adjustment.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch moves the event flags from linux/netfilter/nf_conntrack_common.h
    to net/netfilter/nf_conntrack_ecache.h. This flags are not of any use
    from userspace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • During the module removal there are no possible event listeners
    since ctnetlink must be removed before to allow removing
    nf_conntrack. This patch removes the event reporting for the
    module removal case which is not of any use in the existing code.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch move the internal tuple() macro definition to the
    header file as nf_ct_tuple().

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

17 Apr, 2009

1 commit

  • The removal of the SAME target accidentally removed one feature that is
    not available from the normal NAT targets so far, having multi-range
    mappings that use the same mapping for each connection from a single
    client. The current behaviour is to choose the address from the range
    based on source and destination IP, which breaks when communicating
    with sites having multiple addresses that require all connections to
    originate from the same IP address.

    Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the
    destination address is taken into account for selecting addresses.

    http://bugzilla.kernel.org/show_bug.cgi?id=12954

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

06 Apr, 2009

1 commit

  • This patch fixes a regression (introduced by myself in commit 19abb7b:
    netfilter: ctnetlink: deliver events for conntracks changed from
    userspace) that results in an expectation re-insertion since
    __nf_ct_expect_check() may return 0 for expectation timer refreshing.

    This patch also removes a unnecessary refcount bump that
    pretended to avoid a possible race condition with event delivery
    and expectation timers (as said, not needed since we hold a
    reference to the object since until we finish the expectation
    setup). This also merges nf_ct_expect_related_report() and
    nf_ct_expect_related() which look basically the same.

    Reported-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

27 Mar, 2009

1 commit


26 Mar, 2009

4 commits

  • Usefull for all protocols which do not add additional data, such
    as GRE or UDPlite.

    Signed-off-by: Holger Eitzenberger
    Signed-off-by: Patrick McHardy

    Holger Eitzenberger
     
  • Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.

    This permits an easy conversion from call_rcu() based hash lists to a
    SLAB_DESTROY_BY_RCU one.

    Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

    First, it doesnt fill RCU queues (up to 10000 elements per cpu).
    This reduces OOM possibility, if queued elements are not taken into account
    This reduces latency problems when RCU queue size hits hilimit and triggers
    emergency mode.

    - It allows fast reuse of just freed elements, permitting better use of
    CPU cache.

    - We delete rcu_head from "struct nf_conn", shrinking size of this structure
    by 8 or 16 bytes.

    This patch only takes care of "struct nf_conn".
    call_rcu() is still used for less critical conntrack parts, that may
    be converted later if necessary.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     
  • This is necessary in order to have an upper bound for Netlink
    message calculation, which is not a problem at all, as there
    are no helpers with a longer name.

    Signed-off-by: Holger Eitzenberger
    Signed-off-by: Patrick McHardy

    Holger Eitzenberger
     
  • There is added a single callback for the l3 proto helper. The two
    callbacks for the l4 protos are necessary because of the general
    structure of a ctnetlink event, which is in short:

    CTA_TUPLE_ORIG

    CTA_TUPLE_REPLY

    CTA_ID
    ...
    CTA_PROTOINFO

    CTA_TUPLE_MASTER

    Therefore the formular is

    size := sizeof(generic-nlas) + 3 * sizeof(tuple_nlas) + sizeof(protoinfo_nlas)

    Some of the NLAs are optional, e. g. CTA_TUPLE_MASTER, which is only
    set if it's an expected connection. But the number of optional NLAs is
    small enough to prevent netlink_trim() from reallocating if calculated
    properly.

    Signed-off-by: Holger Eitzenberger
    Signed-off-by: Patrick McHardy

    Holger Eitzenberger