Doug / smarc-fsl-linux-kernel | Embedian Git Server

03 Sep, 2012

1 commit

0b1a34c99 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) NLA_PUT* --> nla_put_* conversion got one case wrong in
nfnetlink_log, fix from Patrick McHardy.

2) Missed error return check in ipw2100 driver, from Julia Lawall.

3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix
from Eric Dumazet.

4) SFC driver erroneously reversed src and dst when reporting filters
via ethtool.

5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in
sja1000 can platform driver, from Alexey Khoroshilov and Sven
Schmitt.

6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only
take the lock when we really need to. From Eric Dumazet.

7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso.

8) CWND reduction in TCP is done incorrectly during non-SACK recovery,
fix from Yuchung Cheng.

9) Revert netpoll change, and fix what was actually a driver specific
problem. From Amerigo Wang. This should cure bootup hangs with
netconsole some people reported.

10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page
pointer. From Ian Campbell.

11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso.

12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet.

13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this
situation. From Oliver Neukum.

14) The davinci ethernet driver triggers an OOPS on removal because it
frees an MDIO object before unregistering it. Fix from Bin Liu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
net: qmi_wwan: add several new Gobi devices
fddi: 64 bit bug in smt_add_para()
net: ethernet: fix kernel OOPS when remove davinci_mdio module
net/xfrm/xfrm_state.c: fix error return code
net: ipv6: fix error return code
net: qmi_wwan: new device: Foxconn/Novatel E396
usbnet: fix deadlock in resume
cs89x0 : packet reception not working
netfilter: nf_conntrack: fix racy timer handling with reliable events
bnx2x: Correct the ndo_poll_controller call
bnx2x: Move netif_napi_add to the open call
ipv4: must use rcu protection while calling fib_lookup
bnx2x: fix 57840_MF pci id
net: ipv4: ipmr_expire_timer causes crash when removing net namespace
e1000e: DoS while TSO enabled caused by link partner with small MSS
l2tp: avoid to use synchronize_rcu in tunnel free function
gianfar: fix default tx vlan offload feature flag
netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX
netfilter: nfnetlink_log: fix error return code in init path
...

Linus Torvalds
2012-09-03 02:28:00 +0800

01 Sep, 2012

3 commits

599901c3e net/xfrm/xfrm_state.c: fix error return code ... Browse Code »

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

//
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}

//

Signed-off-by: Julia Lawall
Signed-off-by: David S. Miller

Julia Lawall
2012-09-01 04:27:48 +0800
48f125ce1 net: ipv6: fix error return code ... Browse Code »

Initialize return variable before exiting on an error path.

The initial initialization of the return variable is also dropped, because
that value is never used.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

//
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}

//

Signed-off-by: Julia Lawall
Signed-off-by: David S. Miller

Julia Lawall
2012-09-01 04:27:48 +0800
0dcd5052c Merge branch 'master' of git://1984.lsi.us.es/nf Browse Code »

David S. Miller
2012-09-01 01:06:37 +0800

31 Aug, 2012

5 commits

5b423f6a4 netfilter: nf_conntrack: fix racy timer handling with reliable events ... Browse Code »

Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.

This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.

Tested-by: Oliver Smith
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2012-08-31 21:50:28 +0800
c5ae7d419 ipv4: must use rcu protection while calling fib_lookup ... Browse Code »

Following lockdep splat was reported by Pavel Roskin :

[ 1570.586223] ===============================
[ 1570.586225] [ INFO: suspicious RCU usage. ]
[ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
[ 1570.586229] -------------------------------
[ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
[ 1570.586233]
[ 1570.586233] other info that might help us debug this:
[ 1570.586233]
[ 1570.586236]
[ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
[ 1570.586238] 2 locks held by Chrome_IOThread/4467:
[ 1570.586240] #0: (slock-AF_INET){+.-...}, at: [] release_sock+0x2c/0xa0
[ 1570.586253] #1: (fnhe_lock){+.-...}, at: [] update_or_create_fnhe+0x2c/0x270
[ 1570.586260]
[ 1570.586260] stack backtrace:
[ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
[ 1570.586265] Call Trace:
[ 1570.586271] [] lockdep_rcu_suspicious+0xfd/0x130
[ 1570.586275] [] update_or_create_fnhe+0x15c/0x270
[ 1570.586278] [] __ip_rt_update_pmtu+0x73/0xb0
[ 1570.586282] [] ip_rt_update_pmtu+0x29/0x90
[ 1570.586285] [] inet_csk_update_pmtu+0x2c/0x80
[ 1570.586290] [] tcp_v4_mtu_reduced+0x2e/0xc0
[ 1570.586293] [] tcp_release_cb+0xa4/0xb0
[ 1570.586296] [] release_sock+0x55/0xa0
[ 1570.586300] [] tcp_sendmsg+0x4af/0xf50
[ 1570.586305] [] inet_sendmsg+0x120/0x230
[ 1570.586308] [] ? inet_sk_rebuild_header+0x40/0x40
[ 1570.586312] [] ? sock_update_classid+0xbd/0x3b0
[ 1570.586315] [] ? sock_update_classid+0x130/0x3b0
[ 1570.586320] [] do_sock_write+0xc5/0xe0
[ 1570.586323] [] sock_aio_write+0x53/0x80
[ 1570.586328] [] do_sync_write+0xa3/0xe0
[ 1570.586332] [] vfs_write+0x165/0x180
[ 1570.586335] [] sys_write+0x45/0x90
[ 1570.586340] [] system_call_fastpath+0x16/0x1b

Signed-off-by: Eric Dumazet
Reported-by: Pavel Roskin
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-31 01:33:08 +0800
acbb219d5 net: ipv4: ipmr_expire_timer causes crash when removing net namespace ... Browse Code »

When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.

Tested in Linux 3.4.8.

Signed-off-by: Francesco Ruggeri
Signed-off-by: David S. Miller

Francesco Ruggeri
2012-08-31 00:51:32 +0800
99469c32f l2tp: avoid to use synchronize_rcu in tunnel free function ... Browse Code »

Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.

Signed-off-by: Dmitry Kozlov
Signed-off-by: David S. Miller

xeb@mail.ru
2012-08-31 00:31:03 +0800
3f509c689 netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation ... Browse Code »

We're hitting bug while trying to reinsert an already existing
expectation:

kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:

[] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
[] ? in4_pton+0x72/0x131
[] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
[] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
[] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
[] ? irq_exit+0x9a/0x9c
[] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]

We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.

Reported-by: Rafal Fitt
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2012-08-31 00:27:14 +0800

30 Aug, 2012

4 commits

6fc09f10f netfilter: nfnetlink_log: fix error return code in init path ... Browse Code »

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

//
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}

//

Signed-off-by: Julia Lawall
Signed-off-by: Pablo Neira Ayuso

Julia Lawall
2012-08-30 09:29:58 +0800
ef6acf68c netfilter: ctnetlink: fix error return code in init path ... Browse Code »

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

//
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}

//

Signed-off-by: Julia Lawall
Signed-off-by: Pablo Neira Ayuso

Julia Lawall
2012-08-30 09:28:22 +0800
0a54e939d ipvs: fix error return code ... Browse Code »

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

//
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}

//

Signed-off-by: Julia Lawall
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Julia Lawall
2012-08-30 09:27:19 +0800
072a9c486 netpoll: revert 6bdb7fe3104 and fix be_poll() instead ... Browse Code »

Against -net.

In the patch "netpoll: re-enable irq in poll_napi()", I tried to
fix the following warning:

[100718.051041] ------------[ cut here ]------------
[100718.051048] WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0()
(Not tainted)
[100718.051049] Hardware name: ProLiant BL460c G7
...
[100718.051068] Call Trace:
[100718.051073] [] ? warn_slowpath_common+0x87/0xc0
[100718.051075] [] ? warn_slowpath_null+0x1a/0x20
[100718.051077] [] ? local_bh_enable_ip+0x7d/0xb0
[100718.051080] [] ? _spin_unlock_bh+0x1b/0x20
[100718.051085] [] ? be_process_mcc+0x74/0x230 [be2net]
[100718.051088] [] ? be_poll_tx_mcc+0x16c/0x290 [be2net]
[100718.051090] [] ? netpoll_poll_dev+0xd6/0x490
[100718.051095] [] ? bond_poll_controller+0x75/0x80 [bonding]
[100718.051097] [] ? netpoll_poll_dev+0x45/0x490
[100718.051100] [] ? ksize+0x19/0x80
[100718.051102] [] ? netpoll_send_skb_on_dev+0x157/0x240

by reenabling IRQ before calling ->poll, but it seems more
problems are introduced after that patch:

http://ozlabs.org/~akpm/stuff/IMG_20120824_122054.jpg
http://marc.info/?l=linux-netdev&m=134563282530588&w=2

So it is safe to fix be2net driver code directly.

This patch reverts the offending commit and fixes be_poll() by
avoid disabling BH there, this is okay because be_poll()
can be called either by poll_napi() which already disables
IRQ, or by net_rx_action() which already disables BH.

Reported-by: Andrew Morton
Reported-by: Sylvain Munaut
Cc: Sylvain Munaut
Cc: Andrew Morton
Cc: David Miller
Cc: Sathya Perla
Cc: Subbu Seetharaman
Cc: Ajit Khaparde
Signed-off-by: Cong Wang
Tested-by: Sylvain Munaut
Signed-off-by: David S. Miller

Amerigo Wang
2012-08-30 03:03:23 +0800

26 Aug, 2012

1 commit

8497ae61d Merge branch 'for-3.6' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd bugfixes from J. Bruce Fields:
"Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie
Heilman for their testing and debugging help."

* 'for-3.6' of git://linux-nfs.org/~bfields/linux:
svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
svcrpc: sends on closed socket should stop immediately
svcrpc: fix BUG() in svc_tcp_clear_pages
nfsd4: fix security flavor of NFSv4.0 callback

Linus Torvalds
2012-08-26 02:43:41 +0800

25 Aug, 2012

3 commits

d05cebb91 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless ... Browse Code »

John W. Linville says:

====================
This batch of fixes is intended for 3.6...

Johannes Berg gives us a pair of iwlwifi fixes. One corrects some
improperly defined ifdefs that lead to crashes and BUG_ONs. The other
prevents attempts to read SRAM for devices that aren't actually started.

Julia Lawall provides an ipw2100 fix to properly set the return code
from a function call before testing it! :-)

Thomas Huehn corrects the improper use of a constant related to a power
setting in ath5k.

Thomas Pedersen offers a mac80211 fix to properly handle destination
addresses of unicast frames passing though a mesh gate.

Vladimir Zapolskiy provides a brcmsmac fix to properly mark the
interface state when the device goes down.
====================

Signed-off-by: David S. Miller

David S. Miller
2012-08-25 03:15:10 +0800
7c4a56fec tcp: fix cwnd reduction for non-sack recovery ... Browse Code »

The cwnd reduction in fast recovery is based on the number of packets
newly delivered per ACK. For non-sack connections every DUPACK
signifies a packet has been delivered, but the sender mistakenly
skips counting them for cwnd reduction.

The fix is to compute newly_acked_sacked after DUPACKs are accounted
in sacked_out for non-sack connections.

Signed-off-by: Yuchung Cheng
Acked-by: Nandita Dukkipati
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Yuchung Cheng
2012-08-25 01:48:58 +0800
20e1db19d netlink: fix possible spoofing from non-root processes ... Browse Code »

Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.

The userspace process usually verifies the legitimate origin in
two ways:

a) Socket credentials. If UID != 0, then the message comes from
some ilegitimate process and the message needs to be dropped.

b) Netlink portID. In general, portID == 0 means that the origin
of the messages comes from the kernel. Thus, discarding any
message not coming from the kernel.

However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.

Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.

This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.

Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2012-08-25 01:36:09 +0800

24 Aug, 2012

3 commits

78df76a06 ipv4: take rt_uncached_lock only if needed ... Browse Code »

Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.

This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.

Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-24 23:47:48 +0800
e72615f6a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless into for-davem

John W. Linville
2012-08-24 23:16:58 +0800
a0dfb2634 af_packet: match_fanout_group() can be static ... Browse Code »

cc: Eric Leblond
Signed-off-by: Fengguang Wu
Signed-off-by: David S. Miller

Fengguang Wu
2012-08-24 00:27:12 +0800

23 Aug, 2012

3 commits

c6b6eedc2 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Browse Code »

John W. Linville
2012-08-23 21:51:15 +0800
9b04f3500 ipv4: properly update pmtu ... Browse Code »

Sylvain Munault reported following info :

- TCP connection get "stuck" with data in send queue when doing
"large" transfers ( like typing 'ps ax' on a ssh connection )
- Only happens on path where the PMTU is lower than the MTU of
the interface
- Is not present right after boot, it only appears 10-20min after
boot or so. (and that's inside the _same_ TCP connection, it works
fine at first and then in the same ssh session, it'll get stuck)
- Definitely seems related to fragments somehow since I see a router
sending ICMP message saying fragmentation is needed.
- Exact same setup works fine with kernel 3.5.1

Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.

ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.

It seems we want to set the expires field to a new value, regardless
of prior one.

With help from Julian Anastasov.

Reported-by: Sylvain Munaut
Signed-off-by: Eric Dumazet
CC: Julian Anastasov
Tested-by: Sylvain Munaut
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-23 10:14:30 +0800
f753c4ec1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull ceph fixes from Sage Weil:
"Jim's fix closes a narrow race introduced with the msgr changes. One
fix resolves problems with debugfs initialization that Yan found when
multiple client instances are created (e.g., two clusters mounted, or
rbd + cephfs), another one fixes problems with mounting a nonexistent
server subdirectory, and the last one fixes a divide by zero error
from unsanitized ioctl input that Dan Carpenter found."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: avoid divide by zero in __validate_layout()
libceph: avoid truncation due to racing banners
ceph: tolerate (and warn on) extraneous dentry from mds
libceph: delay debugfs initialization until we learn global_id

Linus Torvalds
2012-08-23 00:58:05 +0800

22 Aug, 2012

6 commits

27f011243 mac80211: fix DS to MBSS address translation ... Browse Code »

The destination address of unicast frames forwarded through a mesh gate
was being replaced with the broadcast address. Instead leave the
original destination address as the mesh DA. If the nexthop address is
not in the mpath table it will be resolved. If that fails, the frame
will be forwarded to known mesh gates.

Reported-by: Cedric Voncken
Signed-off-by: Thomas Pedersen
Signed-off-by: Johannes Berg

Thomas Pedersen
2012-08-22 15:45:05 +0800
6d4221b53 libceph: avoid truncation due to racing banners ... Browse Code »

Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.

When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work(). During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().

If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner. In this case Ceph's protocol negotiation
can complete succesfully.

The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client. As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read(). If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner. Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.

The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".

Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.

[elder@inktak.com: added note about server-side behavior]

Signed-off-by: Jim Schutt
Reviewed-by: Alex Elder

Jim Schutt
2012-08-22 06:55:27 +0800
e0e3cea46 af_netlink: force credentials passing [CVE-2012-3520] ... Browse Code »

Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e572626961
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet
Cc: Petr Matousek
Cc: Florian Weimer
Cc: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-22 05:53:01 +0800
a9915a1b5 ipv4: fix ip header ident selection in __ip_make_skb() ... Browse Code »

Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3fb (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde
Signed-off-by: Eric Dumazet
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-22 05:51:06 +0800
1a7b27c97 ipv4: Use newinet->inet_opt in inet_csk_route_child_sock() ... Browse Code »

Since 0e734419923bd ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi
Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2012-08-22 05:49:11 +0800
144d56e91 tcp: fix possible socket refcount problem ... Browse Code »

Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281] (sk_lock-AF_INET-RPC){+.+...}, at: [] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281] #0: (rpciod){.+.+.+}, at: [] process_one_work+0x1de/0x47f
[ 2866.132281] #1: ((&task->u.tk_work)){+.+.+.}, at: [] process_one_work+0x1de/0x47f
[ 2866.132281] #2: (sk_lock-AF_INET-RPC){+.+...}, at: [] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] #3: (&icsk->icsk_retransmit_timer){+.-...}, at: [] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281] [] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281] [] ? __sk_free+0xfd/0x114
[ 2866.132281] [] kmem_cache_free+0x6b/0x13a
[ 2866.132281] [] __sk_free+0xfd/0x114
[ 2866.132281] [] sk_free+0x1c/0x1e
[ 2866.132281] [] tcp_write_timer+0x51/0x56
[ 2866.132281] [] run_timer_softirq+0x218/0x35f
[ 2866.132281] [] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281] [] ? rb_commit+0x58/0x85
[ 2866.132281] [] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281] [] __do_softirq+0xcb/0x1f9
[ 2866.132281] [] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281] [] call_softirq+0x1c/0x30
[ 2866.132281] [] do_softirq+0x4a/0xa6
[ 2866.132281] [] irq_exit+0x51/0xad
[ 2866.132281] [] do_IRQ+0x9d/0xb4
[ 2866.132281] [] common_interrupt+0x6f/0x6f
[ 2866.132281] [] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281] [] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281] [] mod_timer+0x178/0x1a9
[ 2866.132281] [] sk_reset_timer+0x19/0x26
[ 2866.132281] [] tcp_rearm_rto+0x99/0xa4
[ 2866.132281] [] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281] [] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281] [] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281] [] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281] [] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281] [] inet_sendmsg+0xaa/0xd5
[ 2866.132281] [] ? inet_autobind+0x5f/0x5f
[ 2866.132281] [] ? trace_clock_local+0x9/0xb
[ 2866.132281] [] sock_sendmsg+0xa3/0xc4
[ 2866.132281] [] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281] [] ? native_sched_clock+0x29/0x6f
[ 2866.132281] [] ? sched_clock+0x9/0xd
[ 2866.132281] [] ? trace_clock_local+0x9/0xb
[ 2866.132281] [] kernel_sendmsg+0x37/0x43
[ 2866.132281] [] xs_send_kvec+0x77/0x80
[ 2866.132281] [] xs_sendpages+0x6f/0x1a0
[ 2866.132281] [] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281] [] xs_tcp_send_request+0x55/0xf1
[ 2866.132281] [] xprt_transmit+0x89/0x1db
[ 2866.132281] [] ? call_connect+0x3c/0x3c
[ 2866.132281] [] call_transmit+0x1c5/0x20e
[ 2866.132281] [] __rpc_execute+0x6f/0x225
[ 2866.132281] [] ? call_connect+0x3c/0x3c
[ 2866.132281] [] rpc_async_schedule+0x28/0x34
[ 2866.132281] [] process_one_work+0x24d/0x47f
[ 2866.132281] [] ? process_one_work+0x1de/0x47f
[ 2866.132281] [] ? __rpc_execute+0x225/0x225
[ 2866.132281] [] worker_thread+0x236/0x317
[ 2866.132281] [] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281] [] kthread+0x9a/0xa2
[ 2866.132281] [] kernel_thread_helper+0x4/0x10
[ 2866.132281] [] ? retint_restore_args+0x13/0x13
[ 2866.132281] [] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281] [] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu
Tested-by: Fengguang Wu
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-22 05:42:23 +0800

21 Aug, 2012

4 commits

d10f27a75 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping ... Browse Code »

The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available. If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Cc: stable@vger.kernel.org
Reported-by: Michael Tokarev
Tested-by: Michael Tokarev
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2012-08-21 06:39:19 +0800
f06f00a24 svcrpc: sends on closed socket should stop immediately ... Browse Code »

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
threads could send further replies before the socket is really shut
down. This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Cc: stable@vger.kernel.org
Reported-by: Malahal Naineni
Tested-by: Malahal Naineni
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2012-08-21 06:38:59 +0800
be1e44441 svcrpc: fix BUG() in svc_tcp_clear_pages ... Browse Code »

Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY. However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2012-08-21 06:38:44 +0800
d1c338a50 libceph: delay debugfs initialization until we learn global_id ... Browse Code »

The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).

Reported by: Yan, Zheng
Signed-off-by: Sage Weil
Reviewed-by: Yehuda Sadeh

Sage Weil
2012-08-21 01:03:15 +0800

20 Aug, 2012

7 commits

2dba62c30 netfilter: nfnetlink_log: fix NLA_PUT macro removal bug ... Browse Code »

Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:

- NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
+ if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))

Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso

Patrick McHardy
2012-08-20 18:40:23 +0800
fae6ef87f net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() ... Browse Code »

This commit removes the sk_rx_dst_set calls from
tcp_create_openreq_child(), because at that point the icsk_af_ops
field of ipv6_mapped TCP sockets has not been set to its proper final
value.

Instead, to make sure we get the right sk_rx_dst_set variant
appropriate for the address family of the new connection, we have
tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
shortly after the call to tcp_create_openreq_child() returns.

This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
with the new approach.

Signed-off-by: Neal Cardwell
Reported-by: Artem Savkov
Cc: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Neal Cardwell
2012-08-20 18:03:33 +0800
3de7a37b0 net/core/dev.c: fix kernel-doc warning ... Browse Code »

Fix kernel-doc warning:

Warning(net/core/dev.c:5745): No description found for parameter 'dev'

Signed-off-by: Randy Dunlap
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Randy Dunlap
2012-08-20 18:00:55 +0800
9d7b0fc1e net: ipv6: fix oops in inet_putpeer() ... Browse Code »

Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
[] xfrm6_dst_destroy+0x42/0xdb
[] dst_destroy+0x1d/0xa4
[] xfrm_bundle_flo_delete+0x2b/0x36
[] flow_cache_gc_task+0x85/0x9f
[] process_one_work+0x122/0x441
[] ? apic_timer_interrupt+0x31/0x38
[] ? flow_cache_new_hashrnd+0x2b/0x2b
[] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2012-08-20 17:56:56 +0800
d92c7f8aa caif: Do not dereference NULL in chnl_recv_cb() ... Browse Code »

In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
which may return NULL, but we do not check for a NULL pointer before
dereferencing it.
This patch adds such a NULL check and properly free's allocated memory
and return an error (-EINVAL) on failure - much better than crashing..

Signed-off-by: Jesper Juhl
Acked-by: Sjur Brændeland
Signed-off-by: David S. Miller

Jesper Juhl
2012-08-20 17:47:49 +0800
6c71bec66 Merge git://1984.lsi.us.es/nf ... Browse Code »

Pable Neira Ayuso says:

====================
The following five patches contain fixes for 3.6-rc, they are:

* Two fixes for message parsing in the SIP conntrack helper, from
Patrick McHardy.

* One fix for the SIP helper introduced in the user-space cthelper
infrastructure, from Patrick McHardy.

* fix missing appropriate locking while modifying one conntrack entry
from the nfqueue integration code, from myself.

* fix possible access to uninitiliazed timer in the nf_conntrack
expectation infrastructure, from myself.
====================

Signed-off-by: David S. Miller

David S. Miller
2012-08-20 17:44:29 +0800
c0de08d04 af_packet: don't emit packet on orig fanout group ... Browse Code »

If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.

Reported-by: Aleksandr Kotov
Signed-off-by: Eric Leblond
Signed-off-by: David S. Miller

Eric Leblond
2012-08-20 17:37:29 +0800