11 Jun, 2009

1 commit


09 Jun, 2009

1 commit


08 Jun, 2009

1 commit

  • Passive OS fingerprinting netfilter module allows to passively detect
    remote OS and perform various netfilter actions based on that knowledge.
    This module compares some data (WS, MSS, options and it's order, ttl, df
    and others) from packets with SYN bit set with dynamically loaded OS
    fingerprints.

    Fingerprint matching rules can be downloaded from OpenBSD source tree
    or found in archive and loaded via netfilter netlink subsystem into
    the kernel via special util found in archive.

    Archive contains library file (also attached), which was shipped
    with iptables extensions some time ago (at least when ipt_osf existed
    in patch-o-matic).

    Following changes were made in this release:
    * added NLM_F_CREATE/NLM_F_EXCL checks
    * dropped _rcu list traversing helpers in the protected add/remove calls
    * dropped unneded structures, debug prints, obscure comment and check

    Fingerprints can be downloaded from
    http://www.openbsd.org/cgi-bin/cvsweb/src/etc/pf.os
    or can be found in archive

    Example usage:
    -d switch removes fingerprints

    Please consider for inclusion.
    Thank you.

    Passive OS fingerprint homepage (archives, examples):
    http://www.ioremap.net/projects/osf

    Signed-off-by: Evgeniy Polyakov
    Signed-off-by: Patrick McHardy

    Evgeniy Polyakov
     

05 Jun, 2009

1 commit

  • Adds support for specifying a range of queues instead of a single queue
    id. Flows will be distributed across the given range.

    This is useful for multicore systems: Instead of having a single
    application read packets from a queue, start multiple
    instances on queues x, x+1, .. x+n. Each instance can process
    flows independently.

    Packets for the same connection are put into the same queue.

    Signed-off-by: Holger Eitzenberger
    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

04 Jun, 2009

1 commit


03 Jun, 2009

2 commits

  • This patch removes the notify chain infrastructure and replace it
    by a simple function pointer. This issue has been mentioned in the
    mailing list several times: the use of the notify chain adds
    too much overhead for something that is only used by ctnetlink.

    This patch also changes nfnetlink_send(). It seems that gfp_any()
    returns GFP_KERNEL for user-context request, like those via
    ctnetlink, inside the RCU read-side section which is not valid.
    Using GFP_KERNEL is also evil since netlink may schedule(),
    this leads to "scheduling while atomic" bug reports.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch moves the event flags from linux/netfilter/nf_conntrack_common.h
    to net/netfilter/nf_conntrack_ecache.h. This flags are not of any use
    from userspace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

02 Jun, 2009

1 commit

  • The patch below adds supporting TCP simultaneous open to conntrack. The
    unused LISTEN state is replaced by a new state (SYN_SENT2) denoting the
    second SYN sent from the reply direction in the new case. The state table
    is updated and the function tcp_in_window is modified to handle
    simultaneous open.

    The functionality can fairly easily be tested by socat. A sample tcpdump
    recording

    23:21:34.244733 IP (tos 0x0, ttl 64, id 49224, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xe75f (correct), 3383710133:3383710133(0) win 5840
    23:21:34.244783 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.0.1.2020 > 192.168.0.254.2020: R, cksum 0x0253 (correct), 0:0(0) ack 3383710134 win 0
    23:21:36.038680 IP (tos 0x0, ttl 64, id 28092, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.1.2020 > 192.168.0.254.2020: S, cksum 0x704b (correct), 2634546729:2634546729(0) win 5840
    23:21:36.038777 IP (tos 0x0, ttl 64, id 49225, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xb179 (correct), 3383710133:3383710133(0) ack 2634546730 win 5840
    23:21:36.038847 IP (tos 0x0, ttl 64, id 28093, offset 0, flags [DF], proto TCP (6), length 52) 192.168.0.1.2020 > 192.168.0.254.2020: ., cksum 0xebad (correct), ack 3383710134 win 2920

    and the corresponding netlink events:

    [NEW] tcp 6 120 SYN_SENT src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 [UNREPLIED] src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
    [UPDATE] tcp 6 120 LISTEN src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
    [UPDATE] tcp 6 60 SYN_RECV src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
    [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [ASSURED]

    The RST packet was dropped in the raw table, thus it did not reach
    conntrack. nfnetlink_conntrack is unpatched so it shows the new SYN_SENT2
    state as the old unused LISTEN.

    With TCP simultaneous open support we satisfy REQ-2 in RFC 5382 ;-) .

    Additional minor correction in this patch is that in order to catch
    uninitialized reply directions, "td_maxwin == 0" is used instead of
    "td_end == 0" because the former can't be true except in uninitialized
    state while td_end may accidentally be equal to zero in the mid of a
    connection.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Patrick McHardy

    Jozsef Kadlecsik
     

28 May, 2009

1 commit


27 May, 2009

1 commit


25 May, 2009

1 commit

  • Robert L Mathews discovered that some clients send evil TCP RST segments,
    which are accepted by netfilter conntrack but discarded by the
    destination. Thus the conntrack entry is destroyed but the destination
    retransmits data until timeout.

    The same technique, i.e. sending properly crafted RST segments, can easily
    be used to bypass connlimit/connbytes based restrictions (the sample
    script written by Robert can be found in the netfilter mailing list
    archives).

    The patch below adds a new flag and new field to struct ip_ct_tcp_state so
    that checking RST segments can be made more strict and thus TCP conntrack
    can catch the invalid ones: the RST segment is accepted only if its
    sequence number higher than or equal to the highest ack we seen from the
    other direction. (The last_ack field cannot be reused because it is used
    to catch resent packets.)

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Patrick McHardy

    Jozsef Kadlecsik
     

06 May, 2009

1 commit


05 May, 2009

2 commits

  • This patch fixes a problem when you use 32 nodes in the cluster
    match:

    % iptables -I PREROUTING -t mangle -i eth0 -m cluster \
    --cluster-total-nodes 32 --cluster-local-node 32 \
    --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 0xffff
    iptables: Invalid argument. Run `dmesg' for more information.
    % dmesg | tail -1
    xt_cluster: this node mask cannot be higher than the total number of nodes

    The problem is related to this checking:

    if (info->node_mask >= (1 << info->total_nodes)) {
    printk(KERN_ERR "xt_cluster: this node mask cannot be "
    "higher than the total number of nodes\n");
    return false;
    }

    (1 << 32) is 1. Thus, the checking fails.

    BTW, I said this before but I insist: I have only tested the cluster
    match with 2 nodes getting ~45% extra performance in an active-active setup.
    The maximum limit of 32 nodes is still completely arbitrary. I'd really
    appreciate if people that have more nodes in their setups let me know.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • Pointed out by Dave Miller:

    CHECK include/linux/netfilter (57 files)
    /home/davem/src/GIT/net-2.6/usr/include/linux/netfilter/xt_LED.h:6: found __[us]{8,16,32,64} type without #include

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

02 May, 2009

1 commit


29 Apr, 2009

1 commit

  • The x_tables are organized with a table structure and a per-cpu copies
    of the counters and rules. On older kernels there was a reader/writer
    lock per table which was a performance bottleneck. In 2.6.30-rc, this
    was converted to use RCU and the counters/rules which solved the performance
    problems for do_table but made replacing rules much slower because of
    the necessary RCU grace period.

    This version uses a per-cpu set of spinlocks and counters to allow to
    table processing to proceed without the cache thrashing of a global
    reader lock and keeps the same performance for table updates.

    Signed-off-by: Stephen Hemminger
    Acked-by: Linus Torvalds
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

24 Apr, 2009

1 commit


27 Mar, 2009

4 commits

  • David S. Miller
     
  • …el/git/tip/linux-2.6-tip

    * 'header-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    x86: headers cleanup - setup.h
    emu101k1.h: fix duplicate include of <linux/types.h>
    compiler-gcc4: conditionalize #error on __KERNEL__
    remove __KERNEL_STRICT_NAMES
    make netfilter use strict integer types
    make drm headers use strict integer types
    make MTD headers use strict integer types
    make most exported headers use strict integer types
    make exported headers use strict posix types
    unconditionally include asm/types.h from linux/types.h
    make linux/types.h as assembly safe
    Neither asm/types.h nor linux/types.h is required for arch/ia64/include/asm/fpu.h
    headers_check fix cleanup: linux/reiserfs_fs.h
    headers_check fix cleanup: linux/nubus.h
    headers_check fix cleanup: linux/coda_psdev.h
    headers_check fix: x86, setup.h
    headers_check fix: x86, prctl.h
    headers_check fix: linux/reinserfs_fs.h
    headers_check fix: linux/socket.h
    headers_check fix: linux/nubus.h
    ...

    Manually fix trivial conflicts in:
    include/linux/netfilter/xt_limit.h
    include/linux/netfilter/xt_statistic.h

    Linus Torvalds
     
  • Ingo Molnar
     
  • Netfilter traditionally uses BSD integer types in its
    interface headers. This changes it to use the Linux
    strict integer types, like everyone else.

    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: Arnd Bergmann
    Acked-by: David S. Miller
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar

    Arnd Bergmann
     

26 Mar, 2009

1 commit


25 Mar, 2009

1 commit


23 Mar, 2009

1 commit


17 Mar, 2009

1 commit

  • This patch adds the iptables cluster match. This match can be used
    to deploy gateway and back-end load-sharing clusters. The cluster
    can be composed of 32 nodes maximum (although I have only tested
    this with two nodes, so I cannot tell what is the real scalability
    limit of this solution in terms of cluster nodes).

    Assuming that all the nodes see all packets (see below for an
    example on how to do that if your switch does not allow this), the
    cluster match decides if this node has to handle a packet given:

    (jhash(source IP) % total_nodes) & node_mask

    For related connections, the master conntrack is used. The following
    is an example of its use to deploy a gateway cluster composed of two
    nodes (where this is the node 1):

    iptables -I PREROUTING -t mangle -i eth1 -m cluster \
    --cluster-total-nodes 2 --cluster-local-node 1 \
    --cluster-proc-name eth1 -j MARK --set-mark 0xffff
    iptables -A PREROUTING -t mangle -i eth1 \
    -m mark ! --mark 0xffff -j DROP
    iptables -A PREROUTING -t mangle -i eth2 -m cluster \
    --cluster-total-nodes 2 --cluster-local-node 1 \
    --cluster-proc-name eth2 -j MARK --set-mark 0xffff
    iptables -A PREROUTING -t mangle -i eth2 \
    -m mark ! --mark 0xffff -j DROP

    And the following commands to make all nodes see the same packets:

    ip maddr add 01:00:5e:00:01:01 dev eth1
    ip maddr add 01:00:5e:00:01:02 dev eth2
    arptables -I OUTPUT -o eth1 --h-length 6 \
    -j mangle --mangle-mac-s 01:00:5e:00:01:01
    arptables -I INPUT -i eth1 --h-length 6 \
    --destination-mac 01:00:5e:00:01:01 \
    -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
    arptables -I OUTPUT -o eth2 --h-length 6 \
    -j mangle --mangle-mac-s 01:00:5e:00:01:02
    arptables -I INPUT -i eth2 --h-length 6 \
    --destination-mac 01:00:5e:00:01:02 \
    -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

    In the case of TCP connections, pickup facility has to be disabled
    to avoid marking TCP ACK packets coming in the reply direction as
    valid.

    echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

    BTW, some final notes:

    * This match mangles the skbuff pkt_type in case that it detects
    PACKET_MULTICAST for a non-multicast address. This may be done in
    a PKTTYPE target for this sole purpose.
    * This match supersedes the CLUSTERIP target.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     

16 Mar, 2009

1 commit

  • Commit 784544739a25c30637397ace5489eeb6e15d7d49 (netfilter: iptables:
    lock free counters) broke a number of modules whose rule data referenced
    itself. A reallocation would not reestablish the correct references, so
    it is best to use a separate struct that does not fall under RCU.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Patrick McHardy

    Jan Engelhardt
     

24 Feb, 2009

1 commit


20 Feb, 2009

2 commits

  • Kernel module providing implementation of LED netfilter target. Each
    instance of the target appears as a led-trigger device, which can be
    associated with one or more LEDs in /sys/class/leds/

    Signed-off-by: Adam Nielsen
    Acked-by: Richard Purdie
    Signed-off-by: Patrick McHardy

    Adam Nielsen
     
  • The reader/writer lock in ip_tables is acquired in the critical path of
    processing packets and is one of the reasons just loading iptables can cause
    a 20% performance loss. The rwlock serves two functions:

    1) it prevents changes to table state (xt_replace) while table is in use.
    This is now handled by doing rcu on the xt_table. When table is
    replaced, the new table(s) are put in and the old one table(s) are freed
    after RCU period.

    2) it provides synchronization when accesing the counter values.
    This is now handled by swapping in new table_info entries for each cpu
    then summing the old values, and putting the result back onto one
    cpu. On a busy system it may cause sampling to occur at different
    times on each cpu, but no packet/byte counts are lost in the process.

    Signed-off-by: Stephen Hemminger

    Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
    BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)

    Acked-by: Eric Dumazet
    Signed-off-by: Patrick McHardy

    Stephen Hemminger
     

18 Feb, 2009

2 commits


30 Jan, 2009

1 commit


13 Jan, 2009

1 commit


16 Dec, 2008

1 commit

  • This patch fixes an inconsistency in nfnetlink_conntrack.h that
    I introduced myself. The problem is that CTA_NAT_SEQ_UNSPEC is
    missing from enum ctattr_natseq. This inconsistency may lead to
    problems in the message parsing in userspace (if the message
    contains the CTA_NAT_SEQ_* attributes, of course).

    This patch breaks backward compatibility, however, the only known
    client of this code is libnetfilter_conntrack which indeed crashes
    because it assumes the existence of CTA_NAT_SEQ_UNSPEC to do
    the parsing.

    The CTA_NAT_SEQ_* attributes were introduced in 2.6.25.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

25 Nov, 2008

1 commit


15 Oct, 2008

1 commit

  • This patch removes the module dependency between ctnetlink and
    nf_nat by means of an indirect call that is initialized when
    nf_nat is loaded. Now, nf_conntrack_netlink only requires
    nf_conntrack and nfnetlink.

    This patch puts nfnetlink_parse_nat_setup_hook into the
    nf_conntrack_core to avoid dependencies between ctnetlink,
    nf_conntrack_ipv4 and nf_conntrack_ipv6.

    This patch also introduces the function ctnetlink_change_nat
    that is only invoked from the creation path. Actually, the
    nat handling cannot be invoked from the update path since
    this is not allowed. By introducing this function, we remove
    the useless nat handling in the update path and we avoid
    deadlock-prone code.

    This patch also adds the required EAGAIN logic for nfnetlink.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

08 Oct, 2008

4 commits