24 Nov, 2011

1 commit


01 Nov, 2011

1 commit


19 Oct, 2011

1 commit

  • To ease skb->truesize sanitization, its better to be able to localize
    all references to skb frags size.

    Define accessors : skb_frag_size() to fetch frag size, and
    skb_frag_size_{set|add|sub}() to manipulate it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Aug, 2011

1 commit

  • A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak.

    This problem was previously fixed via
    64507fdbc29c3a622180378210ecea8659b14e40 (netfilter:
    nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because
    NF_STOLEN can also be returned by a netfilter hook when iterating the
    rules in nf_reinject.

    Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw.

    This is complementary to commit fad54440438a7c231a6ae347738423cbabc936d9
    (netfilter: avoid double free in nf_reinject).

    Cc: Julian Anastasov
    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

29 Jul, 2011

1 commit

  • ipq_build_packet_message() in net/ipv4/netfilter/ip_queue.c and
    net/ipv6/netfilter/ip6_queue.c contain a small potential mem leak as
    far as I can tell.

    We allocate memory for 'skb' with alloc_skb() annd then call
    nlh = NLMSG_PUT(skb, 0, 0, IPQM_PACKET, size - sizeof(*nlh));

    NLMSG_PUT is a macro
    NLMSG_PUT(skb, pid, seq, type, len) \
    NLMSG_NEW(skb, pid, seq, type, len, 0)

    that expands to NLMSG_NEW, which is also a macro which expands to:
    NLMSG_NEW(skb, pid, seq, type, len, flags) \
    ({ if (unlikely(skb_tailroom(skb) < (int)NLMSG_SPACE(len))) \
    goto nlmsg_failure; \
    __nlmsg_put(skb, pid, seq, type, len, flags); })

    If we take the true branch of the 'if' statement and 'goto
    nlmsg_failure', then we'll, at that point, return from
    ipq_build_packet_message() without having assigned 'skb' to anything
    and we'll leak the memory we allocated for it when it goes out of
    scope.

    Fix this by placing a 'kfree(skb)' at 'nlmsg_failure'.

    I admit that I do not know how likely this to actually happen or even
    if there's something that guarantees that it will never happen - I'm
    not that familiar with this code, but if that is so, I've not been
    able to spot it.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Patrick McHardy

    Jesper Juhl
     

16 Jun, 2011

1 commit

  • By default, when broadcast or multicast packet are sent from a local
    application, they are sent to the interface then looped by the kernel
    to other local applications, going throught netfilter hooks in the
    process.

    These looped packet have their MAC header removed from the skb by the
    kernel looping code. This confuse various netfilter's netlink queue,
    netlink log and the legacy ip_queue, because they try to extract a
    hardware address from these packets, but extracts a part of the IP
    header instead.

    This patch prevent NFQUEUE, NFLOG and ip_QUEUE to include a MAC header
    if there is none in the packet.

    Signed-off-by: Nicolas Cavallari
    Signed-off-by: Patrick McHardy

    Nicolas Cavallari
     

06 Jun, 2011

3 commits


12 May, 2011

1 commit


10 May, 2011

1 commit

  • The IPv6 header is not zeroed out in alloc_skb so we must initialize
    it properly unless we want to see IPv6 packets with random TOS fields
    floating around. The current implementation resets the flow label
    but this could be changed if deemed necessary.

    We stumbled upon this issue when trying to apply a mangle rule to
    the RST packet generated by the REJECT target module.

    Signed-off-by: Fernando Luis Vazquez Cao
    Signed-off-by: Pablo Neira Ayuso

    Fernando Luis Vazquez Cao
     

20 Apr, 2011

1 commit


18 Apr, 2011

2 commits


04 Apr, 2011

1 commit

  • We currently use a percpu spinlock to 'protect' rule bytes/packets
    counters, after various attempts to use RCU instead.

    Lately we added a seqlock so that get_counters() can run without
    blocking BH or 'writers'. But we really only need the seqcount in it.

    Spinlock itself is only locked by the current/owner cpu, so we can
    remove it completely.

    This cleanups api, using correct 'writer' vs 'reader' semantic.

    At replace time, the get_counters() call makes sure all cpus are done
    using the old table.

    Signed-off-by: Eric Dumazet
    Cc: Jan Engelhardt
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

31 Mar, 2011

1 commit


20 Mar, 2011

1 commit

  • commit f3c5c1bfd4308 (make ip_tables reentrant) introduced a race in
    handling the stackptr restore, at the end of ipt_do_table()

    We should do it before the call to xt_info_rdunlock_bh(), or we allow
    cpu preemption and another cpu overwrites stackptr of original one.

    A second fix is to change the underflow test to check the origptr value
    instead of 0 to detect underflow, or else we allow a jump from different
    hooks.

    Signed-off-by: Eric Dumazet
    Cc: Jan Engelhardt
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

16 Mar, 2011

1 commit


15 Mar, 2011

1 commit

  • Structures ip6t_replace, compat_ip6t_replace, and xt_get_revision are
    copied from userspace. Fields of these structs that are
    zero-terminated strings are not checked. When they are used as argument
    to a format string containing "%s" in request_module(), some sensitive
    information is leaked to userspace via argument of spawned modprobe
    process.

    The first bug was introduced before the git epoch; the second was
    introduced in 3bc3fe5e (v2.6.25-rc1); the third is introduced by
    6b7d31fc (v2.6.15-rc1). To trigger the bug one should have
    CAP_NET_ADMIN.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Patrick McHardy

    Vasiliy Kulikov
     

13 Mar, 2011

4 commits


03 Mar, 2011

1 commit


20 Feb, 2011

1 commit


17 Feb, 2011

1 commit


21 Jan, 2011

1 commit

  • After commit ae90bdeaeac6b (netfilter: fix compilation when conntrack is
    disabled but tproxy is enabled) we have following warnings :

    net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol
    'nf_ct_frag6_gather' was not declared. Should it be static?
    net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol
    'nf_ct_frag6_output' was not declared. Should it be static?
    net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol
    'nf_ct_frag6_init' was not declared. Should it be static?
    net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol
    'nf_ct_frag6_cleanup' was not declared. Should it be static?

    Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h

    Signed-off-by: Eric Dumazet
    CC: KOVACS Krisztian
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

20 Jan, 2011

1 commit


13 Jan, 2011

3 commits

  • One iptables invocation with 135000 rules takes 35 seconds of cpu time
    on a recent server, using a 32bit distro and a 64bit kernel.

    We eventually trigger NMI/RCU watchdog.

    INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

    COMPAT mode has quadratic behavior and consume 16 bytes of memory per
    rule.

    Switch the xt_compat algos to use an array instead of list, and use a
    binary search to locate an offset in the sorted array.

    This halves memory need (8 bytes per rule), and removes quadratic
    behavior [ O(N*N) -> O(N*log2(N)) ]

    Time of iptables goes from 35 s to 150 ms.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     
  • Simon Horman
     
  • The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
    failed to update the #ifdef stanzas guarding the defragmentation related
    fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

    This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
    without connection tracking.

    Original report:
    http://marc.info/?l=linux-netdev&m=129010118516341&w=2

    Reported-by: Randy Dunlap
    Acked-by: Randy Dunlap
    Signed-off-by: KOVACS Krisztian
    Signed-off-by: Pablo Neira Ayuso

    KOVACS Krisztian
     

11 Jan, 2011

1 commit

  • Using "iptables -L" with a lot of rules have a too big BH latency.
    Jesper mentioned ~6 ms and worried of frame drops.

    Switch to a per_cpu seqlock scheme, so that taking a snapshot of
    counters doesnt need to block BH (for this cpu, but also other cpus).

    This adds two increments on seqlock sequence per ipt_do_table() call,
    its a reasonable cost for allowing "iptables -L" not block BH
    processing.

    Reported-by: Jesper Dangaard Brouer
    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    Acked-by: Stephen Hemminger
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

16 Dec, 2010

1 commit

  • The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
    failed to update the #ifdef stanzas guarding the defragmentation related
    fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

    This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
    without connection tracking.

    Original report:
    http://marc.info/?l=linux-netdev&m=129010118516341&w=2

    Reported-by: Randy Dunlap
    Signed-off-by: KOVACS Krisztian
    Acked-by: Randy Dunlap
    Signed-off-by: Patrick McHardy

    KOVACS Krisztian
     

13 Dec, 2010

1 commit


23 Nov, 2010

1 commit


16 Nov, 2010

1 commit


15 Nov, 2010

1 commit

  • I am observing consistent behavior even with bridges, so let's unlock
    this. xt_mac is already usable in FORWARD, too. Section 9 of
    http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html#section9 says
    the MAC source address is changed, but my observation does not match
    that claim -- the MAC header is retained.

    Signed-off-by: Jan Engelhardt
    [Patrick; code inspection seems to confirm this]
    Signed-off-by: Patrick McHardy

    Jan Engelhardt
     

13 Nov, 2010

1 commit


12 Nov, 2010

1 commit

  • The type of FRAG6_CB(prev)->offset is int, skb->len is *unsigned* int,
    and offset is int.

    Without this patch, type conversion occurred to this expression, when
    (FRAG6_CB(prev)->offset + prev->len) is less than offset.

    Signed-off-by: Shan Wei
    Signed-off-by: Patrick McHardy

    Shan Wei
     

04 Nov, 2010

1 commit