Doug / smarc-fsl-linux-kernel | Embedian Git Server

18 Jun, 2008

7 commits

6d1a3fb56 netlink: genl: fix circular locking ... Browse Code »

genetlink has a circular locking dependency when dumping the registered
families:

- dump start:
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
second time

- dump continuance:
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : take genl_mutex

Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-06-18 17:07:07 +0800
3a5be7d4b Revert "mac80211: Use skb_header_cloned() on TX path." ... Browse Code »

This reverts commit 608961a5eca8d3c6bd07172febc27b5559408c5d.

The problem is that the mac80211 stack not only needs to be able to
muck with the link-level headers, it also might need to mangle all of
the packet data if doing sw wireless encryption.

This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the
bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
bringing this bisection analysis to my attention), and Ilpo (for
trying to analyze this purely from the TCP side).

In 2.6.27 we can take another stab at this, by using something like
skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
tx->key. The ESP protocol code in the IPSEC stack can be used as a
model for implementation.

Signed-off-by: David S. Miller

David S. Miller
2008-06-18 16:19:51 +0800
3c73419c0 af_unix: fix 'poll for write'/ connected DGRAM sockets ... Browse Code »

The unix_dgram_sendmsg routine implements a (somewhat crude)
form of receiver-imposed flow control by comparing the length of the
receive queue of the 'peer socket' with the max_ack_backlog value
stored in the corresponding sock structure, either blocking
the thread which caused the send-routine to be called or returning
EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
sockets. The poll-implementation for these socket types is
datagram_poll from core/datagram.c. A socket is deemed to be writeable
by this routine when the memory presently consumed by datagrams
owned by it is less than the configured socket send buffer size. This
is always wrong for connected PF_UNIX non-stream sockets when the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an
(usual) application to 'poll for writeability by repeated send request
with O_NONBLOCK set' until it has consumed its time quantum.

The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_sendmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.

Signed-off-by: Rainer Weikusat
Signed-off-by: David S. Miller

Rainer Weikusat
2008-06-18 13:28:05 +0800
fe833fca2 xfrm: fix fragmentation for ipv4 xfrm tunnel ... Browse Code »

When generating the ip header for the transformed packet we just copy
the frag_off field of the ip header from the original packet to the ip
header of the new generated packet. If we receive a packet as a chain
of fragments, all but the last of the new generated packets have the
IP_MF flag set. We have to mask the frag_off field to only keep the
IP_DF flag from the original packet. This got lost with git commit
36cf9acf93e8561d9faec24849e57688a81eb9c5 ("[IPSEC]: Separate
inner/outer mode processing on output")

Signed-off-by: Steffen Klassert
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Steffen Klassert
2008-06-18 07:38:23 +0800
a56b8f815 netfilter: nf_conntrack_h323: fix module unload crash ... Browse Code »

The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:

CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000000 00000004 c00a67f0
$ 4 : 802a5ad0 81657e00 00000000 00000000
$ 8 : 00000008 801461c8 00000000 80570050
$12 : 819b0280 819b04b0 00000006 00000000
$16 : 802a5a60 80000000 80b46000 80321010
$20 : 00000000 00000004 802a5ad0 00000001
$24 : 00000000 802257a8
$28 : 802a4000 802a59e8 00000004 801d4e7c
Hi : 0000000b
Lo : 00506320
epc : 802224dc ip_conntrack_help+0x38/0x74 Tainted: P
ra : 801d4e7c nf_iterate+0xbc/0x130
Status: 1000f403 KERNEL EXL IE
Cause : 00800008
BadVA : c00a6828
PrId : 00019374
Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
...
Call Trace:
[] ip_finish_output+0x0/0x23c
[] nf_iterate+0xbc/0x130
[] nf_iterate+0xbc/0x130
[] ip_finish_output+0x0/0x23c
[] ip_finish_output+0x0/0x23c
[] nf_hook_slow+0x9c/0x1a4

One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.

Reported-by: liannan
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-06-18 06:52:32 +0800
8a548868d netfilter: nf_conntrack_h323: fix memory leak in module initialization error path ... Browse Code »

Properly free h323_buffer when helper registration fails.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-06-18 06:52:07 +0800
68b80f113 netfilter: nf_nat: fix RCU races ... Browse Code »

Fix three ct_extend/NAT extension related races:

- When cleaning up the extension area and removing it from the bysource hash,
the nat->ct pointer must not be set to NULL since it may still be used in
a RCU read side

- When replacing a NAT extension area in the bysource hash, the nat->ct
pointer must be assigned before performing the replacement

- When reallocating extension storage in ct_extend, the old memory must
not be freed immediately since it may still be used by a RCU read side

Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-06-18 06:51:47 +0800

17 Jun, 2008

11 commits

7e903c2ae atm: [br2864] fix routed vcmux support ... Browse Code »

From: Eric Kinzie
Signed-off-by: Chas Williams
Signed-off-by: David S. Miller

Eric Kinzie
2008-06-17 08:18:18 +0800
27141666b atm: [br2684] Fix oops due to skb->dev being NULL ... Browse Code »

It happens that if a packet arrives in a VC between the call to open it on
the hardware and the call to change the backend to br2684, br2684_regvcc
processes the packet and oopses dereferencing skb->dev because it is
NULL before the call to br2684_push().

Signed-off-by: Jorge Boncompte [DTI2]
Signed-off-by: Chas Williams

Jorge Boncompte [DTI2]
2008-06-17 08:15:33 +0800
a9d246dbb ipv4: Remove unused definitions in net/ipv4/tcp_ipv4.c. ... Browse Code »

1) Remove ICMP_MIN_LENGTH, as it is unused.

2) Remove unneeded tcp_v4_send_check() declaration.

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2008-06-17 08:07:16 +0800
68be802cd raw: Restore /proc/net/raw correct behavior ... Browse Code »

I just noticed "cat /proc/net/raw" was buggy, missing '\n' separators.

I believe this was introduced by commit 8cd850efa4948d57a2ed836911cfd1ab299e89c6
([RAW]: Cleanup IPv4 raw_seq_show.)

This trivial patch restores correct behavior, and applies to current
Linus tree (should also be applied to stable tree as well.)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-06-17 08:03:32 +0800
6de329e26 net: Fix test for VLAN TX checksum offload capability ... Browse Code »

Selected device feature bits can be propagated to VLAN devices, so we
can make use of TX checksum offload and TSO on VLAN-tagged packets.
However, if the physical device does not do VLAN tag insertion or
generic checksum offload then the test for TX checksum offload in
dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield
false.

This splits the checksum offload test into two functions:

- can_checksum_protocol() tests a given protocol against a feature bitmask

- dev_can_checksum() first tests the skb protocol against the device
features; if that fails and the protocol is htons(ETH_P_8021Q) then
it tests the encapsulated protocol against the effective device
features for VLANs

Signed-off-by: Ben Hutchings
Acked-by: Patrick McHardy
Signed-off-by: David S. Miller

Ben Hutchings
2008-06-17 08:02:28 +0800
319fa2a24 sctp: Correclty set changeover_active for SFR-CACC ... Browse Code »

Right now, any time we set a primary transport we set
the changeover_active flag. As a result, we invoke SFR-CACC
even when there has been no changeover events.

Only set changeover_active, when there is a true changeover
event, i.e. we had a primary path and we are changing to
another transport.

Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller

Vlad Yasevich
2008-06-17 08:00:29 +0800
80896a358 sctp: Correctly cleanup procfs entries upon failure. ... Browse Code »

This patch remove the proc fs entry which has been created if fail to
set up proc fs entry for the SCTP protocol.

Signed-off-by: Wei Yongjun
Acked-by: Neil Horman
Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller

Wei Yongjun
2008-06-17 07:59:55 +0800
93653e044 tcp: Revert reset of deferred accept changes in 2.6.26 ... Browse Code »

Ingo's system is still seeing strange behavior, and he
reports that is goes away if the rest of the deferred
accept changes are reverted too.

Therefore this reverts e4c78840284f3f51b1896cf3936d60a6033c4d2c
("[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack") and
539fae89bebd16ebeafd57a87169bc56eb530d76 ("[TCP]: TCP_DEFER_ACCEPT
updates - defer timeout conflicts with max_thresh").

Just like the other revert, these ideas can be revisited for
2.6.27

Signed-off-by: David S. Miller

David S. Miller
2008-06-17 07:57:40 +0800
2b4743bd6 ipv6 sit: Avoid extra need for compat layer in PRL management. ... Browse Code »

We've introduced extra need of compat layer for ip_tunnel_prl{}
for PRL (Potential Router List) management. Though compat_ioctl
is still missing in ipv4/ipv6, let's make the interface more
straight-forward and eliminate extra need for nasty compat layer
anyway since the interface is new for 2.6.26.

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki
2008-06-17 07:48:20 +0800
47083fc07 pkt_sched: Change HTB_HYSTERESIS to a runtime parameter htb_hysteresis. ... Browse Code »

Add a htb_hysteresis parameter to htb_sch.ko and by sysfs magic make
it runtime adjustable via
/sys/module/sch_htb/parameters/htb_hysteresis mode 640.

Signed-off-by: Jesper Dangaard Brouer
Acked-by: Martin Devera
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2008-06-17 07:39:32 +0800
f9ffceddd pkt_sched: HTB scheduler, change default hysteresis mode to off. ... Browse Code »

The HTB hysteresis mode reduce the CPU load, but at the
cost of scheduling accuracy.

On ADSL links (512 kbit/s upstream), this inaccuracy introduce
significant jitter, enought to disturbe VoIP. For details see my
masters thesis (http://www.adsl-optimizer.dk/thesis/), chapter 7,
section 7.3.1, pp 69-70.

Signed-off-by: Jesper Dangaard Brouer
Acked-by: Martin Devera
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2008-06-17 07:38:33 +0800

15 Jun, 2008

1 commit

34a5d7130 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 Browse Code »

David S. Miller
2008-06-15 08:33:38 +0800

14 Jun, 2008

2 commits

995ad6c5a mac80211: add missing new line in debug print HT_DEBUG ... Browse Code »

This patch adds '\n' in debug printk (wme.c HT DEBUG)

Signed-off-by: Tomas Winkler
Signed-off-by: John W. Linville

Tomas Winkler
2008-06-14 04:14:53 +0800
5c5f9664d mac80211 : fix for iwconfig in ad-hoc mode ... Browse Code »

The patch checks interface status, if it is in IBSS_JOINED mode
show cell id it is associated with.

Signed-off-by: Abhijeet Kolekar
Signed-off-by: Zhu Yi
Signed-off-by: John W. Linville

Abhijeet Kolekar
2008-06-14 04:14:53 +0800

13 Jun, 2008

3 commits

51558576e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tcp: Revert 'process defer accept as established' changes.
ipv6: Fix duplicate initialization of rawv6_prot.destroy
bnx2x: Updating the Maintainer
net: Eliminate flush_scheduled_work() calls while RTNL is held.
drivers/net/r6040.c: correct bad use of round_jiffies()
fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP
ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list()
netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
netfilter: Make nflog quiet when no one listen in userspace.
ipv6: Fail with appropriate error code when setting not-applicable sockopt.
ipv6: Check IPV6_MULTICAST_LOOP option value.
ipv6: Check the hop limit setting in ancillary data.
ipv6 route: Fix route lifetime in netlink message.
ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
dccp: Bug in initial acknowledgment number assignment
dccp ccid-3: X truncated due to type conversion
dccp ccid-3: TFRC reverse-lookup Bug-Fix
dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets
dccp: Fix sparse warnings
dccp ccid-3: Bug-Fix - Zero RTT is possible

Linus Torvalds
2008-06-13 22:34:47 +0800
ec0a19662 tcp: Revert 'process defer accept as established' changes. ... Browse Code »

This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems. The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock. The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> goto death;
> }
>
>+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+ tcp_send_active_reset(sk, GFP_ATOMIC);
>+ goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller

David S. Miller
2008-06-13 07:34:35 +0800
f23d60de7 ipv6: Fix duplicate initialization of rawv6_prot.destroy ... Browse Code »

In changeset 22dd485022f3d0b162ceb5e67d85de7c3806aa20
("raw: Raw socket leak.") code was added so that we
flush pending frames on raw sockets to avoid leaks.

The ipv4 part was fine, but the ipv6 part was not
done correctly. Unlike the ipv4 side, the ipv6 code
already has a .destroy method for rawv6_prot.

So now there were two assignments to this member, and
what the compiler does is use the last one, effectively
making the ipv6 parts of that changeset a NOP.

Fix this by removing the:

.destroy = inet6_destroy_sock,

line, and adding an inet6_destroy_sock() call to the
end of raw6_destroy().

Noticed by Al Viro.

Signed-off-by: David S. Miller
Acked-by: YOSHIFUJI Hideaki

David S. Miller
2008-06-13 07:34:34 +0800

12 Jun, 2008

9 commits

a40565738 Merge branch 'net-2.6-misc-20080611a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix Browse Code »

David S. Miller
2008-06-12 09:11:16 +0800
5cb960a80 Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-2.6 Browse Code »

David S. Miller
2008-06-12 08:53:04 +0800
ceeff7541 netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info() ... Browse Code »

When creation of a new conntrack entry in ctnetlink fails after having
set up the NAT mappings, the conntrack has an extension area allocated
that is not getting properly destroyed when freeing the conntrack again.
This means the NAT extension is still in the bysource hash, causing a
crash when walking over the hash chain the next time:

BUG: unable to handle kernel paging request at 00120fbd
IP: [] nf_nat_setup_info+0x221/0x58a
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP

Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
EIP: 0060:[] EFLAGS: 00010206 CPU: 1
EIP is at nf_nat_setup_info+0x221/0x58a
EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
Call Trace:
[] nla_parse+0x5c/0xb0
[] ctnetlink_change_status+0x190/0x1c6
[] ctnetlink_new_conntrack+0x189/0x61f
[] update_curr+0x3d/0x52
[] nfnetlink_rcv_msg+0xc1/0xd8
[] nfnetlink_rcv_msg+0x18/0xd8
[] nfnetlink_rcv_msg+0x0/0xd8
[] netlink_rcv_skb+0x2d/0x71
[] nfnetlink_rcv+0x19/0x24
[] netlink_unicast+0x1b3/0x216
...

Move invocation of the extension destructors to nf_conntrack_free()
to fix this problem.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875

Reported-and-Tested-by: Krzysztof Piotr Oledzki
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-06-12 08:51:10 +0800
b66985b11 netfilter: Make nflog quiet when no one listen in userspace. ... Browse Code »

The message "nf_log_packet: can't log since no backend logging module loaded
in! Please either load one, or disable logging explicitly" was displayed for
each logged packet when no userspace application is listening to nflog events.
The message seems to warn for a problem with a kernel module missing but as
said before this is not the case. I thus propose to suppress the message (I
don't see any reason to flood the log because a user application has crashed.)

Signed-off-by: Eric Leblond
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Eric Leblond
2008-06-12 08:50:27 +0800
1717699cd ipv6: Fail with appropriate error code when setting not-applicable sockopt. ... Browse Code »

IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
Since they are virtually unavailable for stream sockets,
we should return ENOPROTOOPT instead of EINVAL.

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-06-12 08:19:09 +0800
28d448821 ipv6: Check IPV6_MULTICAST_LOOP option value. ... Browse Code »

Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
and we should return an error of EINVAL otherwise, per RFC3493.

Based on patch from Shan Wei .

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-06-12 08:19:09 +0800
e8766fc86 ipv6: Check the hop limit setting in ancillary data. ... Browse Code »

When specifing the outgoing hop limit as ancillary data for sendmsg(),
the kernel doesn't check the integer hop limit value as specified in
[RFC-3542] section 6.3.

Signed-off-by: Shan Wei
Signed-off-by: YOSHIFUJI Hideaki

Shan Wei
2008-06-12 08:19:08 +0800
36e3deae8 ipv6 route: Fix route lifetime in netlink message. ... Browse Code »

1) We may have route lifetime larger than INT_MAX.
In that case we had wired value in lifetime.
Use INT_MAX if lifetime does not fit in s32.

2) Lifetime is valid iif RTF_EXPIRES is set.

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-06-12 08:19:08 +0800
20c61fbd8 ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER). ... Browse Code »

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-06-12 08:19:08 +0800

11 Jun, 2008

7 commits

f7f866eed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
net: Fix routing tables with id > 255 for legacy software
sky2: Hold RTNL while calling dev_close()
s2io iomem annotations
atl1: fix suspend regression
qeth: start dev queue after tx drop error
qeth: Prepare-function to call s390dbf was wrong
qeth: reduce number of kernel messages
qeth: Use ccw_device_get_id().
qeth: layer 3 Oops in ip event handler
virtio: use callback on empty in virtio_net
virtio: virtio_net free transmit skbs in a timer
virtio: Fix typo in virtio_net_hdr comments
virtio_net: Fix skb->csum_start computation
ehea: set mac address fix
sfc: Recover from RX queue flush failure
add missing lance_* exports
ixgbe: fix typo
forcedeth: msi interrupts
ipsec: pfkey should ignore events when no listeners
pppoe: Unshare skb before anything else
...

Linus Torvalds
2008-06-11 23:39:51 +0800
be4c798a4 dccp: Bug in initial acknowledgment number assignment ... Browse Code »

Step 8.5 in RFC 4340 says for the newly cloned socket

Initialize S.GAR := S.ISS,

but what in fact the code (minisocks.c) does is

Initialize S.GAR := S.ISR,

which is wrong (typo?) -- fixed by the patch.

Signed-off-by: Gerrit Renker

Gerrit Renker
2008-06-11 18:19:10 +0800
7deb0f851 dccp ccid-3: X truncated due to type conversion ... Browse Code »

This fixes a bug in computing the inter-packet-interval t_ipi = s/X:

scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
than 2^26 Bps (~537 Mbps).

Using full 64-bit division now.

Signed-off-by: Gerrit Renker

Gerrit Renker
2008-06-11 18:19:10 +0800
1e8a287c7 dccp ccid-3: TFRC reverse-lookup Bug-Fix ... Browse Code »

This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
the function returned the smallest tabulated value f(p).

The smallest tabulated value of

10^6 * f(p) = sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)

for p=0.0001 is 8172.

Since this value is scaled by 10^6, the outcome of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.

This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, thus unnecessarily reducing the throughput.

Also corrected the printk format (%u for u32).

Signed-off-by: Gerrit Renker

Gerrit Renker
2008-06-11 18:19:10 +0800
65907a433 dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets ... Browse Code »

This fixes an oversight from an earlier patch, ensuring that Ack Vectors
are not processed on request sockets.

The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

"Using CCID-specific options and feature options during a negotiation
for the corresponding CCID feature is NOT RECOMMENDED [...]"

And it is not even possible: when the server receives the Request from the
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
full socket. (This order is necessary, since otherwise the newly created socket
would have to be destroyed whenever an option error occurred - a malicious
hacker could simply send garbage options and exploit this.)

Signed-off-by: Gerrit Renker

Gerrit Renker
2008-06-11 18:19:09 +0800
1e2f0e5e8 dccp: Fix sparse warnings ... Browse Code »

This patch fixes the following sparse warnings:
* nested min(max()) expression:
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one

* Declaration of function prototypes in .c instead of .h file, resulting in
"should it be static?" warnings.

* Declared "struct dccpw" static (local to dccp_probe).

* Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
("Receivers SHOULD implement delayed acknowledgement timers ...").

* Used a different local variable name to avoid
net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
net/dccp/ackvec.c:238:33: originally declared here

* Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.

Signed-off-by: Gerrit Renker

Gerrit Renker
2008-06-11 18:19:09 +0800
3294f202d dccp ccid-3: Bug-Fix - Zero RTT is possible ... Browse Code »

In commit $(825de27d9e40b3117b29a79d412b7a4b78c5d815) (from 27th May, commit
message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
computation was fixed to cope with RTTs < 4 microseconds.

Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
a check against RTT < 4, but introduced a divide-by-zero bug.

All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
ensures non-zero samples. However, a zero RTT is possible on initialisation,
when there is no RTT sample from the Request/Response exchange.

The fix is to use the fallback-RTT from RFC 4340, 3.4.

This is also better than just fixing update_win_count() since it allows other
parts of the code to always assume that the RTT is non-zero during the time
that the CCID is used.

Signed-off-by: Gerrit Renker

Gerrit Renker
2008-06-11 18:19:09 +0800