Eric Lee / smarc-fsl-linux-kernel

28 Oct, 2005

2 commits

236fa0816 Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15 Browse Code »

Linus Torvalds
2005-10-28 23:50:37 +0800
2ad41065d [TCP]: Clear stale pred_flags when snd_wnd changes ... Browse Code »

This bug is responsible for causing the infamous "Treason uncloaked"
messages that's been popping up everywhere since the printk was added.
It has usually been blamed on foreign operating systems. However,
some of those reports implicate Linux as both systems are running
Linux or the TCP connection is going across the loopback interface.

In fact, there really is a bug in the Linux TCP header prediction code
that's been there since at least 2.1.8. This bug was tracked down with
help from Dale Blount.

The effect of this bug ranges from harmless "Treason uncloaked"
messages to hung/aborted TCP connections. The details of the bug
and fix is as follows.

When snd_wnd is updated, we only update pred_flags if
tcp_fast_path_check succeeds. When it fails (for example,
when our rcvbuf is used up), we will leave pred_flags with
an out-of-date snd_wnd value.

When the out-of-date pred_flags happens to match the next incoming
packet we will again hit the fast path and use the current snd_wnd
which will be wrong.

In the case of the treason messages, it just happens that the snd_wnd
cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore
when a zero-window packet comes in we incorrectly conclude that the
window is non-zero.

In fact if the peer continues to send us zero-window pure ACKs we
will continue making the same mistake. It's only when the peer
transmits a zero-window packet with data attached that we get a
chance to snap out of it. This is what triggers the treason
message at the next retransmit timeout.

Signed-off-by: Herbert Xu
Signed-off-by: Arnaldo Carvalho de Melo

Herbert Xu
2005-10-28 01:11:04 +0800

26 Oct, 2005

5 commits

dcab5e1ee [IPV4]: Fix setting broadcast for SIOCSIFNETMASK ... Browse Code »

Fix setting of the broadcast address when the netmask is set via
SIOCSIFNETMASK in Linux 2.6. The code wanted the old value of
ifa->ifa_mask but used it after it had already been overwritten with
the new value.

Signed-off-by: David Engel
Signed-off-by: Arnaldo Carvalho de Melo

David Engel
2005-10-26 11:20:21 +0800
0d0d2bba9 [IPV4]: Remove dead code from ip_output.c ... Browse Code »

skb_prev is assigned from skb, which cannot be NULL. This patch removes the
unnecessary NULL check.

Signed-off-by: Jayachandran C.
Acked-by: James Morris
Signed-off-by: Arnaldo Carvalho de Melo

Jayachandran C
2005-10-26 10:58:54 +0800
1371e37da [IPV4]: Kill redundant rcu_dereference on fa_info ... Browse Code »

This patch kills a redundant rcu_dereference on fa->fa_info in fib_trie.c.
As this dereference directly follows a list_for_each_entry_rcu line, we
have already taken a read barrier with respect to getting an entry from
the list.

This read barrier guarantees that all values read out of fa are valid.
In particular, the contents of structure pointed to by fa->fa_info is
initialised before fa->fa_info is actually set (see fn_trie_insert);
the setting of fa->fa_info itself is further separated with a write
barrier from the insertion of fa into the list.

Therefore by taking a read barrier after obtaining fa from the list
(which is given by list_for_each_entry_rcu), we can be sure that
fa->fa_info contains a valid pointer, as well as the fact that the
data pointed to by fa->fa_info is itself valid.

Signed-off-by: Herbert Xu
Acked-by: Paul E. McKenney
Signed-off-by: David S. Miller
Signed-off-by: Arnaldo Carvalho de Melo

Herbert Xu
2005-10-26 10:25:03 +0800
eed75f191 [NETFILTER] ip_conntrack: Make "hashsize" conntrack parameter writable ... Browse Code »

It's fairly simple to resize the hash table, but currently you need to
remove and reinsert the module. That's bad (we lose connection
state). Harald has even offered to write a daemon which sets this
based on load.

Signed-off-by: Rusty Russell
Signed-off-by: Harald Welte
Signed-off-by: Arnaldo Carvalho de Melo

Harald Welte
2005-10-26 10:19:27 +0800
670c02c2b [NET]: Wider use of for_each_*cpu() ... Browse Code »

In 'net' change the explicit use of for-loops and NR_CPUS into the
general for_each_cpu() or for_each_online_cpu() constructs, as
appropriate. This widens the scope of potential future optimizations
of the general constructs, as well as takes advantage of the existing
optimizations of first_cpu() and next_cpu(), which is advantageous
when the true CPU count is much smaller than NR_CPUS.

Signed-off-by: John Hawkes
Signed-off-by: David S. Miller
Signed-off-by: Arnaldo Carvalho de Melo

John Hawkes
2005-10-26 09:54:01 +0800

23 Oct, 2005

1 commit

c98d80edc [SK_BUFF]: ipvs_property field must be copied ... Browse Code »

IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the
new flag 'ipvs_property' still needs to be copied. This patch should be
included in 2.6.14.

Further comments from Harald Welte:

Sorry, seems like the bug was introduced by me.

Signed-off-by: Julian Anastasov
Signed-off-by: Harald Welte
Signed-off-by: Arnaldo Carvalho de Melo

Julian Anastasov
2005-10-23 03:06:01 +0800

21 Oct, 2005

1 commit

b2cc99f04 [TCP] Allow len == skb->len in tcp_fragment ... Browse Code »

It is legitimate to call tcp_fragment with len == skb->len since
that is done for FIN packets and the FIN flag counts as one byte.
So we should only check for the len > skb->len case.

Signed-off-by: Herbert Xu
Signed-off-by: Arnaldo Carvalho de Melo

Herbert Xu
2005-10-21 03:13:13 +0800

14 Oct, 2005

2 commits

046d20b73 [TCP]: Ratelimit debugging warning. ... Browse Code »

Better safe than sorry.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-14 05:42:24 +0800
c8923c6b8 [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering. ... Browse Code »

Original patch by Harald Welte, with feedback from Herbert Xu
and testing by Sébastien Bernard.

EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly. That is not necessarily true.

This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.

Signed-off-by: David S. Miller

David S. Miller
2005-10-14 05:41:23 +0800

13 Oct, 2005

1 commit

9ff5c59ce [TCP]: Add code to help track down "BUG at net/ipv4/tcp_output.c:438!" ... Browse Code »

This is the second report of this bug. Unfortunately the first
reporter hasn't been able to reproduce it since to provide more
debugging info.

So let's apply this patch for 2.6.14 to

1) Make this non-fatal.
2) Provide the info we need to track it down.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-13 06:59:39 +0800

11 Oct, 2005

11 commits

eeb2b8560 [TWSK]: Grab the module refcount for timewait sockets ... Browse Code »

This is required to avoid unloading a module that has active timewait
sockets, such as DCCP.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2005-10-11 12:25:23 +0800
061cb4a0e [NETFILTER] ctnetlink: add support to change protocol info ... Browse Code »

This patch add support to change the state of the private protocol
information via conntrack_netlink.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2005-10-11 12:23:46 +0800
339231537 [NETFILTER] ctnetlink: allow userspace to change TCP state ... Browse Code »

This patch adds the ability of changing the state a TCP connection. I know
that this must be used with care but it's required to provide a complete
conntrack creation via conntrack_netlink. So I'll document this aspect on
the upcoming docs.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2005-10-11 12:23:28 +0800
a051a8f73 [NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT ... Browse Code »

Initially we used 64bit counters for conntrack-based accounting, since we
had no event mechanism to tell userspace that our counters are about to
overflow. With nfnetlink_conntrack, we now have such a event mechanism and
thus can save 16bytes per connection.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-10-11 12:21:10 +0800
d4875b049 [IPSEC] Fix block size/MTU bugs in ESP ... Browse Code »

This patch fixes the following bugs in ESP:

* Fix transport mode MTU overestimate. This means that the inner MTU
is smaller than it needs be. Worse yet, given an input MTU which
is a multiple of 4 it will always produce an estimate which is not
a multiple of 4.

For example, given a standard ESP/3DES/MD5 transform and an MTU of
1500, the resulting MTU for transport mode is 1462 when it should
be 1464.

The reason for this is because IP header lengths are always a multiple
of 4 for IPv4 and 8 for IPv6.

* Ensure that the block size is at least 4. This is required by RFC2406
and corresponds to what the esp_output function does. At the moment
this only affects crypto_null as its block size is 1.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-11 12:11:34 +0800
a02a64223 [IPSEC]: Use ALIGN macro in ESP ... Browse Code »

This patch uses the macro ALIGN in all the applicable spots for ESP.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-11 12:11:08 +0800
e1c73b78e [NETFILTER] ctnetlink: add one nesting level for TCP state ... Browse Code »

To keep consistency, the TCP private protocol information is nested
attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to
access the TCP state information looks like here below:

CTA_PROTOINFO
CTA_PROTOINFO_TCP
CTA_PROTOINFO_TCP_STATE

instead of:

CTA_PROTOINFO
CTA_PROTOINFO_TCP_STATE

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2005-10-11 11:55:49 +0800
a1bcc3f26 [NETFILTER] ctnetlink: ICMP ID is not mandatory ... Browse Code »

The ID is only required by ICMP type 8 (echo), so it's not
mandatory for all sort of ICMP connections. This patch makes
mandatory only the type and the code for ICMP netlink messages.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2005-10-11 11:53:16 +0800
d000eaf77 [NETFILTER] conntrack_netlink: Fix endian issue with status from userspace ... Browse Code »

When we send "status" from userspace, we forget to convert the endianness.
This patch adds the reqired conversion. Thanks to Pablo Neira for
discovering this.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-10-11 11:52:51 +0800
f40863cec [NETFILTER] ipt_ULOG: Mark ipt_ULOG as OBSOLETE ... Browse Code »

Similar to nfnetlink_queue and ip_queue, we mark ipt_ULOG as obsolete.
This should have been part of the original nfnetlink_log merge, but
I somehow missed it.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-10-11 11:51:53 +0800
85d9b05d9 [NETFILTER] PPTP helper: Add missing Kconfig dependency ... Browse Code »

PPTP should not be selectable without conntrack enabled

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-10-11 11:47:42 +0800

09 Oct, 2005

1 commit

dd0fc66fb [PATCH] gfp flags annotations - part 1 ... Browse Code »

- added typedef unsigned int __nocast gfp_t;

- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-10-09 06:00:57 +0800

06 Oct, 2005

1 commit

42a39450f [TCP]: BIC coding bug in Linux 2.6.13 ... Browse Code »

Missing parenthesis in causes BIC to be slow in increasing congestion
window.

Spotted by Injong Rhee.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-10-06 03:09:31 +0800

05 Oct, 2005

3 commits

8eea00a44 [IPVS]: fix sparse gfp nocast warnings ... Browse Code »

From: Randy Dunlap

Fix implicit nocast warnings in ip_vs code:
net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller

Randy Dunlap
2005-10-05 13:42:15 +0800
a5181ab06 [NETFILTER]: Fix Kconfig typo ... Browse Code »

Signed-off-by: Horst H. von Brand
Signed-off-by: David S. Miller

Horst H. von Brand
2005-10-05 06:58:56 +0800
e6308be85 [IPV4]: fib_trie root-node expansion ... Browse Code »

The patch below introduces special thresholds to keep root node in the trie
large. This gives a flatter tree at the cost of a modest memory increase.
Overall it seems to be gain and this was also proposed by one the authors
of the paper in recent a seminar.

Main table after loading 123 k routes.

Aver depth: 3.30
Max depth: 9
Root-node size 12 bits
Total size: 4044 kB

With the patch:
Aver depth: 2.78
Max depth: 8
Root-node size 15 bits
Total size: 4150 kB

An increase of 8-10% was seen in forwading performance for an rDoS attack.

Signed-off-by: Robert Olsson
Signed-off-by: David S. Miller

Robert Olsson
2005-10-05 04:01:58 +0800

04 Oct, 2005

5 commits

7ce312467 [IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by default ... Browse Code »

It's not a good idea to be smurf'able by default.
The few people who need this can turn it on.

Signed-off-by: David S. Miller

David S. Miller
2005-10-04 07:07:30 +0800
e5ed63991 [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl ... Browse Code »

The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.

1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().

There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition. I've marked it as such so that we remember to fix it.

This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-04 05:35:55 +0800
444fc8fc3 [IPV4]: Fix "Proxy ARP seems broken" ... Browse Code »

Meelis Roos wrote:
> RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3,
> RK> it appears to be completely broken. The firewall is 212.18.232.186,
>
> Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to
> ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging
> yet since it'a a production gateway.

The breakage is caused by the change to use the CB area for flagging
whether a packet has been queued due to proxy_delay. This area gets
cleared every time arp_rcv gets called. Unfortunately packets delayed
due to proxy_delay also go through arp_rcv when they are reprocessed.

In fact, I can't think of a reason why delayed proxy packets should go
through netfilter again at all. So the easiest solution is to bypass
that and go straight to arp_process.

This is essentially what would've happened before netfilter support
was added to ARP.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-04 05:18:10 +0800
81c3d5470 [INET]: speedup inet (tcp/dccp) lookups ... Browse Code »

Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
unsigned short skc_family;
volatile unsigned char skc_state;
unsigned char skc_reuse;
int skc_bound_dev_if;
struct hlist_node skc_node;
struct hlist_node skc_bind_node;
atomic_t skc_refcnt;
+ unsigned int skc_hash;
struct proto *skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
(((__sk)->sk_hash == (__hash))
((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \
(!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
(((__sk)->sk_hash == (__hash)) && \
(inet_sk(__sk)->daddr == (__saddr)) && \
(inet_sk(__sk)->rcv_saddr == (__daddr)) && \
(!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

- Adds a prefetch(head->chain.first) in
__inet_lookup_established()/__tcp_v4_check_established() and
__inet6_lookup_established()/__tcp_v6_check_established() and
__dccp_v4_check_established() to bring into cache the first element of the
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Eric Dumazet
2005-10-04 05:13:38 +0800
325ed8239 [NET]: Fix packet timestamping. ... Browse Code »

I've found the problem in general. It affects any 64-bit
architecture. The problem occurs when you change the system time.

Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base. You then wind the clock back
by a day. From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.

In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.

When we do overflow, we'll need a better solution of course.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-10-04 04:57:23 +0800

30 Sep, 2005

2 commits

09e9ec871 [TCP]: Don't over-clamp window in tcp_clamp_window() ... Browse Code »

From: Alexey Kuznetsov

Handle better the case where the sender sends full sized
frames initially, then moves to a mode where it trickles
out small amounts of data at a time.

This known problem is even mentioned in the comments
above tcp_grow_window() in tcp_input.c, specifically:

...
* The scheme does not work when sender sends good segments opening
* window and then starts to feed us spagetti. But it should work
* in common situations. Otherwise, we have to rely on queue collapsing.
...

When the sender gives full sized frames, the "struct sk_buff" overhead
from each packet is small. So we'll advertize a larger window.
If the sender moves to a mode where small segments are sent, this
ratio becomes tilted to the other extreme and we start overrunning
the socket buffer space.

tcp_clamp_window() tries to address this, but it's clamping of
tp->window_clamp is a wee bit too aggressive for this particular case.

Fix confirmed by Ion Badulescu.

Signed-off-by: David S. Miller

Alexey Kuznetsov
2005-09-30 08:17:15 +0800
01ff367e6 [TCP]: Revert 6b251858d377196b8cea20e65cae60f584a42735 ... Browse Code »

But retain the comment fix.

Alexey Kuznetsov has explained the situation as follows:

--------------------

I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is
not continuous: f.e. for mss=1095 it needs initial window 1095*4, but
for mss=1096 it is 1096*3. We do not know exactly what mss sender used
for calculations. If we advertised 1096 (and calculate initial window
3*1096), the sender could limit it to some value < 1096 and then it
will need window his_mss*4 > 3*1096 to send initial burst.

See?

So, the honest function for inital rcv_wnd derived from
tcp_init_cwnd() is:

init_rcv_wnd(mss)=
min { init_cwnd(mss1)*mss1 for mss1 < 1096)
return mss*4;
if (mss < 1096*2)
return 1096*4;
return mss*2;

(I just scrablled a graph of piece of paper, it is difficult to see or
to explain without this)

I selected it differently giving more window than it is strictly
required. Initial receive window must be large enough to allow sender
following to the rfc (or just setting initial cwnd to 2) to send
initial burst. But besides that it is arbitrary, so I decided to give
slack space of one segment.

Actually, the logic was:

If mss is low/normal (

David S. Miller
2005-09-30 08:07:20 +0800

29 Sep, 2005

1 commit

6b251858d [TCP]: Fix init_cwnd calculations in tcp_select_initial_window() ... Browse Code »

Match it up to what RFC2414 really specifies.
Noticed by Rick Jones.

Signed-off-by: David S. Miller

David S. Miller
2005-09-29 07:31:48 +0800

27 Sep, 2005

1 commit

188bab3ae [NETFILTER]: Fix invalid module autoloading by splitting iptable_nat ... Browse Code »

When you've enabled conntrack and NAT as a module (standard case in all
distributions), and you've also enabled the new conntrack netlink
interface, loading ip_conntrack_netlink.ko will auto-load iptable_nat.ko.
This causes a huge performance penalty, since for every packet you iterate
the nat code, even if you don't want it.

This patch splits iptable_nat.ko into the NAT core (ip_nat.ko) and the
iptables frontend (iptable_nat.ko). Threfore, ip_conntrack_netlink.ko will
only pull ip_nat.ko, but not the frontend. ip_nat.ko will "only" allocate
some resources, but not affect runtime performance.

This separation is also a nice step in anticipation of new packet filters
(nf-hipac, ipset, pkttables) being able to use the NAT core.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-09-27 06:25:11 +0800

25 Sep, 2005

2 commits

8ddec7460 [NETFILTER] ip_conntrack: Update event cache when status changes ... Browse Code »

The GRE, SCTP and TCP protocol helpers did not call
ip_conntrack_event_cache() when updating ct->status. This patch adds
the respective calls.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-09-25 07:56:08 +0800
d67b24c40 [NETFILTER]: Fix ip[6]t_NFQUEUE Kconfig dependency ... Browse Code »

We have to introduce a separate Kconfig menu entry for the NFQUEUE targets.
They cannot "just" depend on nfnetlink_queue, since nfnetlink_queue could
be linked into the kernel, whereas iptables can be a module.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-09-25 07:52:03 +0800

23 Sep, 2005

1 commit

1dfbab594 [NETFILTER] Fix conntrack event cache deadlock/oops ... Browse Code »

This patch fixes a number of bugs. It cannot be reasonably split up in
multiple fixes, since all bugs interact with each other and affect the same
function:

Bug #1:
The event cache code cannot be called while a lock is held. Therefore, the
call to ip_conntrack_event_cache() within ip_ct_refresh_acct() needs to be
moved outside of the locked section. This fixes a number of 2.6.14-rcX
oops and deadlock reports.

Bug #2:
We used to call ct_add_counters() for unconfirmed connections without
holding a lock. Since the add operations are not atomic, we could race
with another CPU.

Bug #3:
ip_ct_refresh_acct() lost REFRESH events in some cases where refresh
(and the corresponding event) are desired, but no accounting shall be
performed. Both, evenst and accounting implicitly depended on the skb
parameter bein non-null. We now re-introduce a non-accounting
"ip_ct_refresh()" variant to explicitly state the desired behaviour.

Signed-off-by: Harald Welte
Signed-off-by: David S. Miller

Harald Welte
2005-09-23 14:46:57 +0800