01 Oct, 2008
4 commits
-
Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while
dumping is on-going.Signed-off-by: Timo Teras
Signed-off-by: David S. Miller -
ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to
be initialized) when dst_alloc() is called from ip6_dst_blackhole().
Otherwise, it results in the following (xfrm_larval_drop is now set to
1 by default):[ 78.697642] Unable to handle kernel paging request for data at address 0x0000004c
[ 78.703449] Faulting instruction address: 0xc0097f54
[ 78.786896] Oops: Kernel access of bad area, sig: 11 [#1]
[ 78.792791] PowerMac
[ 78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb
[ 78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430
[ 78.809997] REGS: eef19ad0 TRAP: 0300 Not tainted (2.6.27-rc5)
[ 78.815743] MSR: 00001032 CR: 22242482 XER: 20000000
[ 78.821550] DAR: 0000004c, DSISR: 40000000
[ 78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000
[ 78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000
[ 78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960
[ 78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30
[ 78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064
[ 78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94
[ 78.862581] LR [c0334a28] dst_alloc+0x40/0xc4
[ 78.868451] Call Trace:
[ 78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable)
[ 78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4
[ 78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc
[ 78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88
[ 78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78
[ 78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4
[ 78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0
[ 78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210
[ 78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38
[ 78.927295] --- Exception: c01 at 0xfe2d730
[ 78.927297] LR = 0xfe2d71c
[ 78.939019] Instruction dump:
[ 78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378
[ 78.950694] 90010024 7fc000a6 57c0045e 7c000124 8383005c 2f9f0000 419e0050
[ 78.956464] ---[ end trace 05fa1ed7972487a1 ]---As commented by Benjamin Thery, the bug was introduced by
f2fc6a54585a1be6669613a31fbaba2ecbadcd36, while adding network
namespaces support to ipv6 routes.Signed-off-by: Arnaud Ebalard
Acked-by: Benjamin Thery
Signed-off-by: David S. Miller -
The following actions are possible:
tcp_v6_rcv
skb->dev = NULL;
tcp_v6_do_rcv
tcp_v6_hnd_req
tcp_check_req
req->rsk_ops->send_ack == tcp_v6_send_ackSo, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace
from dst entry.Thanks to Vitaliy Gusev for initial problem finding
in IPv4 code.Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller -
Fix NULL dereference in tcp_4_send_ack().
As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs:
BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0
IP: [] tcp_v4_send_ack+0x203/0x250Stack: ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d
0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8
0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020
Call Trace:
[] tcp_v4_reqsk_send_ack+0x20/0x22
[] tcp_check_req+0x108/0x14c
[] ? rt_intern_hash+0x322/0x33c
[] tcp_v4_do_rcv+0x399/0x4ec
[] ? skb_checksum+0x4f/0x272
[] ? __inet_lookup_listener+0x14a/0x15c
[] tcp_v4_rcv+0x6a1/0x701
[] ip_local_deliver_finish+0x157/0x24a
[] ip_local_deliver+0x72/0x7c
[] ip_rcv_finish+0x38d/0x3b2
[] ? scsi_io_completion+0x19d/0x39e
[] ip_rcv+0x2a2/0x2e5
[] netif_receive_skb+0x293/0x303
[] process_backlog+0x80/0xd0
[] ? __rcu_process_callbacks+0x125/0x1b4
[] net_rx_action+0xb9/0x17f
[] __do_softirq+0xa3/0x164
[] call_softirq+0x1c/0x28
[] do_softirq+0x34/0x72
[] local_bh_enable_ip+0x3f/0x50
[] _spin_unlock_bh+0x12/0x14
[] release_sock+0xb8/0xc1
[] inet_stream_connect+0x146/0x25c
[] ? autoremove_wake_function+0x0/0x38
[] sys_connect+0x68/0x8e
[] ? fd_install+0x5f/0x68
[] ? sock_map_fd+0x55/0x62
[] system_call_after_swapgs+0x7b/0x80Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48
RIP [] tcp_v4_send_ack+0x203/0x250
RSP
CR2: 00000000000004d0Signed-off-by: Vitaliy Gusev
Signed-off-by: David S. Miller
30 Sep, 2008
3 commits
-
Since call to function sctp_sf_abort_violation() need paramter 'arg' with
'struct sctp_chunk' type, it will read the chunk type and chunk length from
the chunk_hdr member of chunk. But call to sctp_sf_violation_paramlen()
always with 'struct sctp_paramhdr' type's parameter, it will be passed to
sctp_sf_abort_violation(). This may cause kernel panic.sctp_sf_violation_paramlen()
|-- sctp_sf_abort_violation()
|-- sctp_make_abort_violation()This patch fixed this problem. This patch also fix two place which called
sctp_sf_violation_paramlen() with wrong paramter type.Signed-off-by: Wei Yongjun
Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller -
fb65a7c091529bfffb1262515252c0d0f6241c5c ("iucv: Fix bad merging.") fixed
a merge error, but in a wrong way. We now end up with the bug below.
This patch corrects the mismerge like it was intended.BUG: scheduling while atomic: swapper/1/0x00000000
Modules linked in:
CPU: 1 Not tainted 2.6.27-rc7-00094-gc0f4d6d #9
Process swapper (pid: 1, task: 000000003fe7d988, ksp: 000000003fe838c0)
0000000000000000 000000003fe839b8 0000000000000002 0000000000000000
000000003fe83a58 000000003fe839d0 000000003fe839d0 0000000000390de6
000000000058acd8 00000000000000d0 000000003fe7dcd8 0000000000000000
000000000000000c 000000000000000d 0000000000000000 000000003fe83a28
000000000039c5b8 0000000000015e5e 000000003fe839b8 000000003fe83a00
Call Trace:
([] show_trace+0xe6/0x134)
[] __schedule_bug+0xa2/0xa8
[] schedule+0x49c/0x910
[] schedule_timeout+0xc4/0x114
[] wait_for_common+0xe8/0x1b4
[] call_usermodehelper_exec+0xa6/0xec
[] kobject_uevent_env+0x418/0x438
[] bus_add_driver+0x1e4/0x298
[] driver_register+0x90/0x18c
[] netiucv_init+0x168/0x2c8
[] do_one_initcall+0x3e/0x17c
[] kernel_init+0x1ce/0x248
[] kernel_thread_starter+0x6/0xc
[] kernel_thread_starter+0x0/0xc
iucv: NETIUCV driver initialized
initcall netiucv_init+0x0/0x2c8 returned with preemption imbalanceSigned-off-by: Heiko Carstens
Signed-off-by: David S. Miller -
We're never supposed to shrink the headroom or tailroom. In fact,
shrinking the headroom is a fatal action.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
25 Sep, 2008
10 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertion
ath9k: disable MIB interrupts to fix interrupt storm
[Bluetooth] Fix USB disconnect handling of btusb driver
[Bluetooth] Fix wrong URB handling of btusb driver
[Bluetooth] Fix I/O errors on MacBooks with Broadcom chips -
The current code ignores rules for internal options in HBH/DST options
header in packet processing if 'Not strict' mode is specified (which is not
implemented). Clearly it is not expected by user.Kernel should reject HBH/DST rule insertion with 'Not strict' mode
in the first place.Signed-off-by: Yasuyuki Kozakai
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix put_data error handling
9p: use an IS_ERR test rather than a NULL test
9p: introduce missing kfree
9p-trans_fd: fix and clean up module init/exit paths
9p-trans_fd: don't do fs segment mangling in p9_fd_poll()
9p-trans_fd: clean up p9_conn_create()
9p-trans_fd: fix trans_fd::p9_conn_destroy()
9p: implement proper trans module refcounting and unregistration -
Abhishek Kulkarni pointed out an inconsistency in the way
errors are returned from p9_put_data. On deeper exploration it
seems the error handling for this path was completely wrong.
This patch adds checks for allocation problems and propagates
errors correctly.Signed-off-by: Eric Van Hensbergen
-
Error handling code following a kmalloc should free the allocated data.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)//
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
}
x->f = E
...>
(
return \(0\|\|ptr\);
|
return@p2 ...;
)@script:python@
p1 << r.p1;
p2 << r.p2;
@@print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
//Signed-off-by: Julia Lawall
Signed-off-by: Eric Van Hensbergen
Signed-off-by: Andrew Morton -
trans_fd leaked p9_mux_wq on module unload. Fix it. While at it,
collapse p9_mux_global_init() into p9_trans_fd_init(). It's easier to
follow this way and the global poll_tasks array is about to removed
anyway.Signed-off-by: Tejun Heo
Signed-off-by: Eric Van Hensbergen -
p9_fd_poll() is never called with user pointers and f_op->poll()
doesn't expect its arguments to be from userland. There's no need to
set kernel ds before calling f_op->poll() from p9_fd_poll(). Remove
it.Signed-off-by: Tejun Heo
Signed-off-by: Eric Van Hensbergen -
* Use kzalloc() to allocate p9_conn and remove 0/NULL initializations.
* Clean up error return paths.
Signed-off-by: Tejun Heo
Signed-off-by: Eric Van Hensbergen -
p9_conn_destroy() first kills all current requests by calling
p9_conn_cancel(), then waits for the request list to be cleared by
waiting on p9_conn->equeue. After that, polling is stopped and the
trans is destroyed. This sequence has a few problems.* Read and write works were never cancelled and the p9_conn can be
destroyed while the works are running as r/w works remove requests
from the list and dereference the p9_conn from them.* The list emptiness wait using p9_conn->equeue wouldn't trigger
because p9_conn_cancel() always clears all the lists and the only
way the wait can be triggered is to have another task to issue a
request between the slim window between p9_conn_cancel() and the
wait, which isn't safe under the current implementation with or
without the wait.This patch fixes the problem by first stopping poll, which can
schedule r/w works, first and cancle r/w works which guarantees that
r/w works are not and will not run from that point and then calling
p9_conn_cancel() and do the rest of destruction.Signed-off-by: Tejun Heo
Signed-off-by: Eric Van Hensbergen -
9p trans modules aren't refcounted nor were they unregistered
properly. Fix it.* Add 9p_trans_module->owner and reference the module on each trans
instance creation and put it on destruction.* Protect v9fs_trans_list with a spinlock. This isn't strictly
necessary as the list is manipulated only during module loading /
unloading but it's a good idea to make the API safe.* Unregister trans modules when the corresponding module is being
unloaded.* While at it, kill unnecessary EXPORT_SYMBOL on p9_trans_fd_init().
Signed-off-by: Tejun Heo
Signed-off-by: Eric Van Hensbergen
23 Sep, 2008
1 commit
-
The reasons for disabling paccept() are as follows:
* The API is more complex than needed. There is AFAICS no demonstrated
use case that the sigset argument of this syscall serves that couldn't
equally be served by the use of pselect/ppoll/epoll_pwait + traditional
accept(). Roland seems to concur with this opinion
(http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255). I
have (more than once) asked Ulrich to explain otherwise
(http://thread.gmane.org/gmane.linux.kernel/723952/focus=731018), but he
does not respond, so one is left to assume that he doesn't know of such
a case.* The use of a sigset argument is not consistent with other I/O APIs
that can block on a single file descriptor (e.g., read(), recv(),
connect()).* The behavior of paccept() when interrupted by a signal is IMO strange:
the kernel restarts the system call if SA_RESTART was set for the
handler. I think that it should not do this -- that it should behave
consistently with paccept()/ppoll()/epoll_pwait(), which never restart,
regardless of SA_RESTART. The reasoning here is that the very purpose
of paccept() is to wait for a connection or a signal, and that
restarting in the latter case is probably never useful. (Note: Roland
disagrees on this point, believing that rather paccept() should be
consistent with accept() in its behavior wrt EINTR
(http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).)I believe that instead, a simpler API, consistent with Ulrich's other
recent additions, is preferable:accept4(int fd, struct sockaddr *sa, socklen_t *salen, ind flags);
(This simpler API was originally proposed by Ulrich:
http://thread.gmane.org/gmane.linux.network/92072)If this simpler API is added, then if we later decide that the sigset
argument really is required, then a suitable bit in 'flags' could be added
to indicate the presence of the sigset argument.At this point, I am hoping we either will get a counter-argument from
Ulrich about why we really do need paccept()'s sigset argument, or that he
will resubmit the original accept4() patch.Signed-off-by: Michael Kerrisk
Cc: David Miller
Cc: Davide Libenzi
Cc: Alan Cox
Cc: Ulrich Drepper
Cc: Jakub Jelinek
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2008
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netdev: simple_tx_hash shouldn't hash inside fragments
21 Sep, 2008
1 commit
-
Currently simple_tx_hash is hashing inside of udp fragments. As a result
packets are getting getting sent to all queues when they shouldn't be.
This causes a serious performance regression which can be seen by sending
UDP frames larger than mtu on multiqueue devices. This change will make
it so that fragments are hashed only as IP datagrams w/o any protocol
information.Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller
20 Sep, 2008
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
e100: Use pci_pme_active to clear PME_Status and disable PME#
e1000: prevent corruption of EEPROM/NVM
forcedeth: call restore mac addr in nv_shutdown path
bnx2: Promote vector field in bnx2_irq structure from u16 to unsigned int
sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTH
sctp: do not enable peer features if we can't do them.
sctp: set the skb->ip_summed correctly when sending over loopback.
udp: Fix rcv socket locking
19 Sep, 2008
2 commits
-
If INIT-ACK is received with SupportedExtensions parameter which
indicates that the peer does not support AUTH, the packet will be
silently ignore, and sctp_process_init() do cleanup all of the
transports in the association.
When T1-Init timer is expires, OOPS happen while we try to choose
a different init transport.The solution is to only clean up the non-active transports, i.e
the ones that the peer added. However, that introduces a problem
with sctp_connectx(), because we don't mark the proper state for
the transports provided by the user. So, we'll simply mark
user-provided transports as ACTIVE. That will allow INIT
retransmissions to work properly in the sctp_connectx() context
and prevent the crash.Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller -
Do not enable peer features like addip and auth, if they
are administratively disabled localy. If the peer resports
that he supports something that we don't, neither end can
use it so enabling it is pointless. This solves a problem
when talking to a peer that has auth and addip enabled while
we do not. Found by Andrei Pelinescu-Onciul .Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller
18 Sep, 2008
1 commit
-
Loopback used to clobber the ip_summed filed which sctp then used
to figure out if it needed to do checksumming or not. Now that
loopback doesn't do that any more, sctp needs to set the ip_summed
field correctly.Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller
17 Sep, 2008
1 commit
-
this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(),
so that the device and driver names are inside the warning message.
This helps automated tools like kerneloops.org to collect the data
and do statistics, as well as making it more likely that humans
cut-n-paste the important message as part of a bugreport.Signed-off-by: Arjan van de Ven
Signed-off-by: Linus Torvalds
16 Sep, 2008
1 commit
-
The previous patch in response to the recursive locking on IPsec
reception is broken as it tries to drop the BH socket lock while in
user context.This patch fixes it by shrinking the section protected by the
socket lock to sock_queue_rcv_skb only. The only reason we added
the lock is for the accounting which happens in that function.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
12 Sep, 2008
1 commit
-
To speed up the Simple Pairing connection setup, the support for the
default link policy has been enabled. This is in contrast to settings
the link policy on every connection setup. Using the default link policy
is the preferred way since there is no need to dynamically change it for
every connection.For backward compatibility reason and to support old userspace the
HCISETLINKPOL ioctl has been switched over to using hci_request() to
issue the HCI command for setting the default link policy instead of
just storing it in the HCI device structure.However the hci_request() can only be issued when the device is
brought up. If used on a device that is registered, but still down
it will timeout and fail. This is problematic since the command is
put on the TX queue and the Bluetooth core tries to submit it to
hardware that is not ready yet. The timeout for these requests is
10 seconds and this causes a significant regression when setting up
a new device.The userspace can perfectly handle a failure of the HCISETLINKPOL
ioctl and will re-submit it later, but the 10 seconds delay causes
a problem. So in case hci_request() is called on a device that is
still down, just fail it with ENETDOWN to indicate what happens.Signed-off-by: Marcel Holtmann
10 Sep, 2008
1 commit
-
This fixes kernel bugzilla 11469: "TUN with 1024 neighbours:
ip6_dst_lookup_tail NULL crash"dst->neighbour is not necessarily hooked up at this point
in the processing path, so blindly dereferencing it is
the wrong thing to do. This NULL check exists in other
similar paths and this case was just an oversight.Also fix the completely wrong and confusing indentation
here while we're at it.Based upon a patch by Evgeniy Polyakov.
Signed-off-by: Neil Horman
Signed-off-by: David S. Miller
09 Sep, 2008
8 commits
-
The commit commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3 ("[XFRM]:
Speed up xfrm_policy and xfrm_state walking") inadvertently removed
larval states and socket policies from netlink dumps. This patch
restores them.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
The Security Mode 4 of the Bluetooth 2.1 specification has strict
authentication and encryption requirements. It is the initiators job
to create a secure ACL link. However in case of malicious devices, the
acceptor has to make sure that the ACL is encrypted before allowing
any kind of L2CAP connection. The only exception here is the PSM 1 for
the service discovery protocol, because that is allowed to run on an
insecure ACL link.Previously it was enough to reject a L2CAP connection during the
connection setup phase, but with Bluetooth 2.1 it is forbidden to
do any L2CAP protocol exchange on an insecure link (except SDP).The new hci_conn_check_link_mode() function can be used to check the
integrity of an ACL link. This functions also takes care of the cases
where Security Mode 4 is disabled or one of the devices is based on
an older specification.Signed-off-by: Marcel Holtmann
-
With the introduction of Security Mode 4 and Simple Pairing from the
Bluetooth 2.1 specification it became mandatory that the initiator
requires authentication and encryption before any L2CAP channel can
be established. The only exception here is PSM 1 for the service
discovery protocol (SDP). It is meant to be used without any encryption
since it contains only public information. This is how Bluetooth 2.0
and before handle connections on PSM 1.For Bluetooth 2.1 devices the pairing procedure differentiates between
no bonding, general bonding and dedicated bonding. The L2CAP layer
wrongly uses always general bonding when creating new connections, but it
should not do this for SDP connections. In this case the authentication
requirement should be no bonding and the just-works model should be used,
but in case of non-SDP connection it is required to use general bonding.If the new connection requires man-in-the-middle (MITM) protection, it
also first wrongly creates an unauthenticated link key and then later on
requests an upgrade to an authenticated link key to provide full MITM
protection. With Simple Pairing the link key generation is an expensive
operation (compared to Bluetooth 2.0 and before) and doing this twice
during a connection setup causes a noticeable delay when establishing
a new connection. This should be avoided to not regress from the expected
Bluetooth 2.0 connection times. The authentication requirements are known
up-front and so enforce them.To fulfill these requirements the hci_connect() function has been extended
with an authentication requirement parameter that will be stored inside
the connection information and can be retrieved by userspace at any
time. This allows the correct IO capabilities exchange and results in
the expected behavior.Signed-off-by: Marcel Holtmann
-
The ACL config stage keeps holding a reference count on incoming
connections when requesting the extended features. This results in
keeping an ACL link up without any users. The problem here is that
the Bluetooth specification doesn't define an ownership of the ACL
link and thus it can happen that the implementation on the initiator
side doesn't care about disconnecting unused links. In this case the
acceptor needs to take care of this.Signed-off-by: Marcel Holtmann
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
bridge: don't allow setting hello time to zero
netns : fix kernel panic in timewait socket destruction
pkt_sched: Fix qdisc state in net_tx_action()
netfilter: nf_conntrack_irc: make sure string is terminated before calling simple_strtoul
netfilter: nf_conntrack_gre: nf_ct_gre_keymap_flush() fixlet
netfilter: nf_conntrack_gre: more locking around keymap list
netfilter: nf_conntrack_sip: de-static helper pointers -
Dushan Tcholich reports that on his system ksoftirqd can consume
between %6 to %10 of cpu time, and cause ~200 context switches per
second.He then correlated this with a report by bdupree@techfinesse.com:
http://marc.info/?l=linux-kernel&m=119613299024398&w=2
and the culprit cause seems to be starting the bridge interface.
In particular, when starting the bridge interface, his scripts
are specifying a hello timer interval of "0".The bridge hello time can't be safely set to values less than 1
second, otherwise it is possible to end up with a runaway timer.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
How to reproduce ?
- create a network namespace
- use tcp protocol and get timewait socket
- exit the network namespace
- after a moment (when the timewait socket is destroyed), the kernel
panics.# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[] []
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
[] ? inet_twdr_hangman+0x0/0x9e
[] ? inet_twdr_hangman+0x27/0x9e
[] ? run_timer_softirq+0x12c/0x193
[] ? __do_softirq+0x5e/0xcd
[] ? call_softirq+0x1c/0x28
[] ? do_softirq+0x2c/0x68
[] ? smp_apic_timer_interrupt+0x8e/0xa9
[] ? apic_timer_interrupt+0x66/0x70
[] ? default_idle+0x27/0x3b
[] ? cpu_idle+0x5f/0x7dCode: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [] inet_twdr_do_twkill_work+0x6e/0xb8
RSP
CR2: 0000000000000007This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to
the namespace like for example in:
inet_twdr_do_twkill_work
-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);Purging the timewait sockets at the network namespace destruction will:
1) speed up memory freeing for the namespace
2) fix kernel panic on asynchronous timewait destructionSigned-off-by: Daniel Lezcano
Acked-by: Denis V. Lunev
Acked-by: Eric W. Biederman
Signed-off-by: David S. Miller
08 Sep, 2008
4 commits
-
net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc
is neither ran nor rescheduled, which may cause endless loop in
dev_deactivate().Reported-by: Denys Fedoryshchenko
Tested-by: Denys Fedoryshchenko
Signed-off-by: Jarek Poplawski
Acked-by: Herbert Xu
Signed-off-by: David S. Miller -
Alexey Dobriyan points out:
1. simple_strtoul() silently accepts all characters for given base even
if result won't fit into unsigned long. This is amazing stupidity in
itself, but2. nf_conntrack_irc helper use simple_strtoul() for DCC request parsing.
Data first copied into 64KB buffer, so theoretically nothing prevents
reading past the end of it, since data comes from network given 1).This is not actually a problem currently since we're guaranteed to have
a 0 byte in skb_shared_info or in the buffer the data is copied to, but
to make this more robust, make sure the string is actually terminated.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
It does "kfree(list_head)" which looks wrong because entity that was
allocated is definitely not list_head.However, this all works because list_head is first item in
struct nf_ct_gre_keymap.Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
gre_keymap_list should be protected in all places.
(unless I'm misreading something)Signed-off-by: Alexey Dobriyan
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller