Eric Lee / smarc-fsl-linux-kernel

27 May, 2005

6 commits

c8b35d2a2 [TOKENRING]: net/802/tr.c: s/struct rif_cache_s/struct rif_cache/ ... Browse Code »

"_s" suffix is certainly of hungarian origin.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2005-05-27 03:59:42 +0800
c6b336539 [TOKENRING]: be'ify trh_hdr, trllc, rif_cache_s ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2005-05-27 03:59:05 +0800
92d63decc From: Kazunori Miyazawa <kazunori@miyazawa.org> ... Browse Code »

[XFRM] Call dst_check() with appropriate cookie

This fixes infinite loop issue with IPv6 tunnel mode.

Signed-off-by: Kazunori Miyazawa
Signed-off-by: Hideaki YOSHIFUJI
Signed-off-by: David S. Miller

Hideaki YOSHIFUJI
2005-05-27 03:58:04 +0800
0dca51d36 [PKT_SCHED] netem: allow random reordering (with fix) ... Browse Code »

Here is a fixed up version of the reorder feature of netem.
It is the same as the earlier patch plus with the bugfix from Julio merged in.
Has expected backwards compatibility behaviour.

Go ahead and merge this one, the TCP strangeness I was seeing was due
to the reordering bug, and previous version of TSO patch.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-05-27 03:55:48 +0800
0f9f32ac6 [PKT_SCHED] netem: use only inner qdisc -- no private skbuff queue ... Browse Code »

Netem works better if there if packets are just queued in the inner discipline
rather than having a separate delayed queue. Change to use the dequeue/requeue
to peek like TBF does.

By doing this potential qlen problems with the old method are avoided. The problems
happened when the netem_run that moved packets from the inner discipline to the nested
discipline failed (because inner queue was full). This happened in dequeue, so the
effective qlen of the netem would be decreased (because of the drop), but there was
no way to keep the outer qdisc (caller of netem dequeue) in sync.

The problem window is still there since this patch doesn't address the issue of
requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue
should always work. Long term correct fix is to implement qdisc->peek in all the qdisc's
to allow for this (needed by several other qdisc's as well).

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-05-27 03:55:01 +0800
0afb51e72 [PKT_SCHED]: netem: reinsert for duplication ... Browse Code »

Handle duplication of packets in netem by re-inserting at top of qdisc tree.
This avoid problems with qlen accounting with nested qdisc. This recursion
requires no additional locking but will potentially increase stack depth.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-05-27 03:53:49 +0800

24 May, 2005

2 commits

180e42503 [IPV6]: Fix xfrm tunnel oops with large packets ... Browse Code »

Signed-off-by: Herbert Xu
Acked-by: Hideaki YOSHIFUJI
Signed-off-by: David S. Miller

Herbert Xu
2005-05-24 04:11:07 +0800
314324121 [TCP]: Fix stretch ACK performance killer when doing ucopy. ... Browse Code »

When we are doing ucopy, we try to defer the ACK generation to
cleanup_rbuf(). This works most of the time very well, but if the
ucopy prequeue is large, this ACKing behavior kills performance.

With TSO, it is possible to fill the prequeue so large that by the
time the ACK is sent and gets back to the sender, most of the window
has emptied of data and performance suffers significantly.

This behavior does help in some cases, so we should think about
re-enabling this trick in the future, using some kind of limit in
order to avoid the bug case.

Signed-off-by: David S. Miller

David S. Miller
2005-05-24 03:03:06 +0800

20 May, 2005

8 commits

aa1c6a6f7 [NETLINK]: Defer socket destruction a bit ... Browse Code »

In netlink_broadcast() we're sending shared skb's to netlink listeners
when possible (saves some copying). This is OK, since we hold the only
other reference to the skb.

However, this implies that we must drop our reference on the skb, before
allowing a receiving socket to disappear. Otherwise, the socket buffer
accounting is disrupted.

Signed-off-by: Tommy S. Christensen
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Tommy S. Christensen
2005-05-20 04:07:32 +0800
68acc024e [NETLINK]: Move broadcast skb_orphan to the skb_get path. ... Browse Code »

Cloned packets don't need the orphan call.

Signed-off-by: Tommy S. Christensen
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Tommy S. Christensen
2005-05-20 04:06:35 +0800
db61ecc33 [NETLINK]: Fix race with recvmsg(). ... Browse Code »

This bug causes:

assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122)

What's happening is that:

1) The skb is sent to socket 1.
2) Someone does a recvmsg on socket 1 and drops the ref on the skb.
Note that the rmalloc is not returned at this point since the
skb is still referenced.
3) The same skb is now sent to socket 2.

This version of the fix resurrects the skb_orphan call that was moved
out, last time we had 'shared-skb troubles'. It is practically a no-op
in the common case, but still prevents the possible race with recvmsg.

Signed-off-by: Tommy S. Christensen
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Tommy S. Christensen
2005-05-20 03:46:59 +0800
31c26852c [IPSEC]: Verify key payload in verify_one_algo ... Browse Code »

We need to verify that the payload contains enough data so that
attach_one_algo can copy alg_key_len bits from the payload.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-05-20 03:39:49 +0800
b9e9dead0 [IPSEC]: Fixed alg_key_len usage in attach_one_algo ... Browse Code »

The variable alg_key_len is in bits and not bytes. The function
attach_one_algo is currently using it as if it were in bytes.
This causes it to read memory which may not be there.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-05-20 03:39:04 +0800
8be58932c [NETFILTER]: Do not be clever about SKB ownership in ip_ct_gather_frags(). ... Browse Code »

Just do an skb_orphan() and be done with it.
Based upon discussions with Herbert Xu on netdev.

Signed-off-by: David S. Miller

David S. Miller
2005-05-20 03:36:33 +0800
d9fa0f392 [IP_VS]: Remove extra __ip_vs_conn_put() for incoming ICMP. ... Browse Code »

Remove extra __ip_vs_conn_put for incoming ICMP in direct routing
mode. Mark de Vries reports that IPVS connections are not leaked anymore.

Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2005-05-20 03:29:59 +0800
f81a0bffa [AF_UNIX]: Use lookup_create(). ... Browse Code »

currently it opencodes it, but that's in the way of chaning the
lookup_hash interface.

I'd prefer to disallow modular af_unix over exporting lookup_create,
but I'll leave that to you.

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2005-05-20 03:26:43 +0800

19 May, 2005

2 commits

2fdba6b08 [IPV4/IPV6] Ensure all frag_list members have NULL sk ... Browse Code »

Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's. The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter. Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.

So let's go the other way and make this an invariant:

For any skb on a frag_list, skb->sk must be NULL.

That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.

The above invariant is actually violated in the following patch
for a short duration inside ip_fragment. This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-05-19 13:52:33 +0800
d48102007 [XFRM]: skb_cow_data() does not set proper owner for new skbs. ... Browse Code »

It looks like skb_cow_data() does not set
proper owner for newly created skb.

If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there
might be a situation when we require recreating skb and
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.

If above "analysis" is correct then attached patch fixes that.

Signed-off-by: Evgeniy Polyakov
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Evgeniy Polyakov
2005-05-19 13:51:45 +0800

06 May, 2005

5 commits

02c30a84e [PATCH] update Ross Biro bouncing email address ... Browse Code »

Ross moved. Remove the bad email address so people will find the correct
one in ./CREDITS.

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2005-05-06 07:36:49 +0800
60d530655 [IPV4]: multipath_wrandom.c GPF fixes ... Browse Code »

multipath_wrandom needs to use GFP_ATOMIC.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2005-05-06 05:30:15 +0800
3ef4e9a8d [ATALK]: Add alloc_ltalkdev(). ... Browse Code »

this matches the API used by other link layer like ethernet or token
ring.

Signed-off-by: Christoph Hellwig
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Christoph Hellwig
2005-05-06 05:25:59 +0800
476e19cfa [IPV6]: Fix OOPS when using IPV6_ADDRFORM ... Browse Code »

This causes sk->sk_prot to change, which makes the socket
release free the sock into the wrong SLAB cache. Fix this
by introducing sk_prot_creator so that we always remember
where the sock came from.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2005-05-06 04:35:15 +0800
25ae3f59b [DECNET]: Fix build after C99 netlink initializer change. ... Browse Code »

Signed-off-by: Rafael J. Wysocki
Signed-off-by: David S. Miller

Rafael J. Wysocki
2005-05-06 04:13:29 +0800

05 May, 2005

1 commit

bfd4bda09 Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git Browse Code »

David Woodhouse
2005-05-05 20:59:37 +0800

04 May, 2005

16 commits

56c3b7d78 [PATCH] ISA DMA Kconfig fixes - part 4 (irda) ... Browse Code »

* net/irda/irda_device.c::irda_setup_dma() made conditional on
ISA_DMA_API (it uses helpers in question and irda is usable on
platforms that don't have them at all - think of USB IRDA, for
example).
* irda drivers that depend on ISA DMA marked as dependent on
ISA_DMA_API

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-05-04 22:33:14 +0800
14d50e78f [PKT_SCHED]: Action repeat ... Browse Code »

Long standing bug.
Policy to repeat an action never worked.

Signed-off-by: J Hadi Salim
Signed-off-by: David S. Miller

J Hadi Salim
2005-05-04 07:29:13 +0800
aabc9761b [IPSEC]: Store idev entries ... Browse Code »

I found a bug that stopped IPsec/IPv6 from working. About
a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
entries. If the cached socket dst entry is IPsec, then rt6i_idev will
be NULL.

Since we want to look at the rt6i_idev of the original route in this
case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
as we do for a number of other IPv6 route attributes. Unfortunately
this means that we need some new code to handle the references to
rt6i_idev. That's why this patch is bigger than it would otherwise be.

I've also done the same thing for IPv4 since it is conceivable that
once these idev attributes start getting used for accounting, we
probably need to dereference them for IPv4 IPsec entries too.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-05-04 07:27:10 +0800
d5d75cd6b [PKT_SCHED]: netetm: adjust parent qlen when duplicating ... Browse Code »

Fix qlen underrun when doing duplication with netem. If netem is used
as leaf discipline, then the parent needs to be tweaked when packets
are duplicated.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-05-04 07:24:57 +0800
771018e76 [PKT_SCHED]: netetm: make qdisc friendly to outer disciplines ... Browse Code »

Netem currently dumps packets into the queue when timer expires. This
patch makes work by self-clocking (more like TBF). It fixes a bug
when 0 delay is requested (only doing loss or duplication).

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-05-04 07:24:32 +0800
8cbe1d46d [PKT_SCHED]: netetm: trap infinite loop hange on qlen underflow ... Browse Code »

Due to bugs in netem (fixed by later patches), it is possible to get qdisc
qlen to go negative. If this happens the CPU ends up spinning forever
in qdisc_run(). So add a BUG_ON() to trap it.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2005-05-04 07:24:03 +0800
bd96535b8 [NETFILTER]: Drop conntrack reference in ip_dev_loopback_xmit() ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2005-05-04 07:21:37 +0800
e4f8ab00c [NETFILTER]: Fix nf_debug_ip_local_deliver() ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2005-05-04 07:20:39 +0800
cacaddf57 [NET]: Disable queueing when carrier is lost. ... Browse Code »

Some network drivers call netif_stop_queue() when detecting loss of
carrier. This leads to packets being queued up at the qdisc level for
an unbound period of time. In order to prevent this effect, the core
networking stack will now cease to queue packets for any device, that
is operationally down (i.e. the queue is flushed and disabled).

Signed-off-by: Tommy S. Christensen
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Tommy S. Christensen
2005-05-04 07:18:52 +0800
0f4821e7b [XFRM/RTNETLINK]: Decrement qlen properly in {xfrm_,rt}netlink_rcv(). ... Browse Code »

If we free up a partially processed packet because it's
skb->len dropped to zero, we need to decrement qlen because
we are dropping out of the top-level loop so it will do
the decrement for us.

Spotted by Herbert Xu.

Signed-off-by: David S. Miller

David S. Miller
2005-05-04 07:15:59 +0800
09e143059 [NETLINK]: Fix infinite loops in synchronous netlink changes. ... Browse Code »

The qlen should continue to decrement, even if we
pop partially processed SKBs back onto the receive queue.

Signed-off-by: David S. Miller

David S. Miller
2005-05-04 06:30:05 +0800
2a0a6ebee [NETLINK]: Synchronous message processing. ... Browse Code »

Let's recap the problem. The current asynchronous netlink kernel
message processing is vulnerable to these attacks:

1) Hit and run: Attacker sends one or more messages and then exits
before they're processed. This may confuse/disable the next netlink
user that gets the netlink address of the attacker since it may
receive the responses to the attacker's messages.

Proposed solutions:

a) Synchronous processing.
b) Stream mode socket.
c) Restrict/prohibit binding.

2) Starvation: Because various netlink rcv functions were written
to not return until all messages have been processed on a socket,
it is possible for these functions to execute for an arbitrarily
long period of time. If this is successfully exploited it could
also be used to hold rtnl forever.

Proposed solutions:

a) Synchronous processing.
b) Stream mode socket.

Firstly let's cross off solution c). It only solves the first
problem and it has user-visible impacts. In particular, it'll
break user space applications that expect to bind or communicate
with specific netlink addresses (pid's).

So we're left with a choice of synchronous processing versus
SOCK_STREAM for netlink.

For the moment I'm sticking with the synchronous approach as
suggested by Alexey since it's simpler and I'd rather spend
my time working on other things.

However, it does have a number of deficiencies compared to the
stream mode solution:

1) User-space to user-space netlink communication is still vulnerable.

2) Inefficient use of resources. This is especially true for rtnetlink
since the lock is shared with other users such as networking drivers.
The latter could hold the rtnl while communicating with hardware which
causes the rtnetlink user to wait when it could be doing other things.

3) It is still possible to DoS all netlink users by flooding the kernel
netlink receive queue. The attacker simply fills the receive socket
with a single netlink message that fills up the entire queue. The
attacker then continues to call sendmsg with the same message in a loop.

Point 3) can be countered by retransmissions in user-space code, however
it is pretty messy.

In light of these problems (in particular, point 3), we should implement
stream mode netlink at some point. In the mean time, here is a patch
that implements synchronous processing.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-05-04 05:55:09 +0800
96c360234 [NETLINK]: cb_lock does not needs ref count on sk ... Browse Code »

Here is a little optimisation for the cb_lock used by netlink_dump.
While fixing that race earlier, I noticed that the reference count
held by cb_lock is completely useless. The reason is that in order
to obtain the protection of the reference count, you have to take
the cb_lock. But the only way to take the cb_lock is through
dereferencing the socket.

That is, you must already possess a reference count on the socket
before you can take advantage of the reference count held by cb_lock.
As a corollary, we can remve the reference count held by the cb_lock.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2005-05-04 05:43:27 +0800
033d89990 [PKT_SCHED]: HTB: Drop packet when direct queue is full ... Browse Code »

htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is
destined for the direct_queue but the direct_queue is full. (Before
this: erroneously returned NET_XMIT_SUCCESS even though the packet was
not enqueued)

Signed-off-by: Asim Shankar
Signed-off-by: David S. Miller

Asim Shankar
2005-05-04 05:39:33 +0800
c3924c70d [TCP]: Optimize check in port-allocation code, v6 version. ... Browse Code »

Signed-off-by: Folkert van Heusden
Signed-off-by: David S. Miller

Folkert van Heusden
2005-05-04 05:36:45 +0800
0b2531bdc [TCP]: Optimize check in port-allocation code. ... Browse Code »

Signed-off-by: Folkert van Heusden
Signed-off-by: David S. Miller

Folkert van Heusden
2005-05-04 05:36:08 +0800