Eric Lee / smarc-fsl-linux-kernel

21 Feb, 2012

1 commit

8ebbfb495 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Assorted fixes, sat in -next for a week or so...

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
vfs: fix compat_sys_stat() handling of overflows in st_nlink
quota: Fix deadlock with suspend and quotas
vfs: Provide function to get superblock and wait for it to thaw
vfs: fix panic in __d_lookup() with high dentry hashtable counts
autofs4 - fix lockdep splat in autofs
vfs: fix d_inode_lookup() dentry ref leak

Linus Torvalds
2012-02-21 08:13:58 +0800

16 Feb, 2012

1 commit

33b5d30cd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless into for-davem

John W. Linville
2012-02-16 02:41:52 +0800

15 Feb, 2012

3 commits

58e05f357 netpoll: netpoll_poll_dev() should access dev->flags ... Browse Code »
1

commit 5a698af53f (bond: service netpoll arp queue on master device)
tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags

Signed-off-by: Eric Dumazet
Cc: WANG Cong
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-15 04:24:26 +0800
f65bd5ec4 RxRPC: Fix kcalloc parameters swapped ... Browse Code »

The first parameter should be "number of elements" and the second parameter
should be "element size".

Signed-off-by: Axel Lin
Acked-by: David Howells
Signed-off-by: David S. Miller

Axel Lin
2012-02-15 03:41:55 +0800
0af2a0d05 tcp: fix tcp_shifted_skb() adjustment of lost_cnt_hint for FACK ... Browse Code »
1

This commit ensures that lost_cnt_hint is correctly updated in
tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment
in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders
need their own adjustment.

This applies the spirit of 1e5289e121372a3494402b1b131b41bfe1cf9b7f -
except now that the sequence range passed into tcp_sacktag_one() is
correct we need only have a special case adjustment for FACK.

Signed-off-by: Neal Cardwell
Signed-off-by: David S. Miller

Neal Cardwell
2012-02-15 03:38:57 +0800

14 Feb, 2012

1 commit

074b85175 vfs: fix panic in __d_lookup() with high dentry hashtable counts ... Browse Code »

When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup(). Fix this in dcache_init() and similar areas.

Signed-off-by: Dimitri Sivanich
Acked-by: David S. Miller
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dimitri Sivanich
2012-02-14 09:45:38 +0800

13 Feb, 2012

2 commits

daef52bab tcp: fix range tcp_shifted_skb() passes to tcp_sacktag_one() ... Browse Code »
45

Fix the newly-SACKed range to be the range of newly-shifted bytes.

Previously - since 832d11c5cd076abc0aa1eaf7be96c81d1a59ce41 -
tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start
and end sequence numbers of the skb it passes in set to the range just
beyond the range that is newly-SACKed.

This commit also removes a special-case adjustment to lost_cnt_hint in
tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint
in tcp_sacktag_one() now properly handles this things now that the
correct start sequence number is passed in.

Signed-off-by: Neal Cardwell
Signed-off-by: David S. Miller

Neal Cardwell
2012-02-13 14:00:22 +0800
cc9a672ee tcp: allow tcp_sacktag_one() to tag ranges not aligned with skbs ... Browse Code »
45

This commit allows callers of tcp_sacktag_one() to pass in sequence
ranges that do not align with skb boundaries, as tcp_shifted_skb()
needs to do in an upcoming fix in this patch series.

In fact, now tcp_sacktag_one() does not need to depend on an input skb
at all, which makes its semantics and dependencies more clear.

Signed-off-by: Neal Cardwell
Signed-off-by: David S. Miller

Neal Cardwell
2012-02-13 14:00:21 +0800

11 Feb, 2012

6 commits

8df54d622 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Quoth David:

1) GRO MAC header comparisons were ethernet specific, breaking other
link types. This required a multi-faceted fix to cure the originally
noted case (Infiniband), because IPoIB was lying about it's actual
hard header length. Thanks to Eric Dumazet, Roland Dreier, and
others.

2) Fix build failure when INET_UDP_DIAG is built in and ipv6 is modular.
From Anisse Astier.

3) Off by ones and other bug fixes in netprio_cgroup from Neil Horman.

4) ipv4 TCP reset generation needs to respect any network interface
binding from the socket, otherwise route lookups might give a
different result than all the other segments received. From Shawn
Lu.

5) Fix unintended regression in ipv4 proxy ARP responses, from Thomas
Graf.

6) Fix SKB under-allocation bug in sh_eth, from Yoshihiro Shimoda.

7) Revert skge PCI mapping changes that are causing crashes for some
folks, from Stephen Hemminger.

8) IPV4 route lookups fill in the wildcarded fields of the given flow
lookup key passed in, which is fine most of the time as this is
exactly what the caller's want. However there are a few cases that
want to retain the original flow key values afterwards, so handle
those cases properly. Fix from Julian Anastasov.

9) IGB/IXGBE VF lookup bug fixes from Greg Rose.

10) Properly null terminate filename passed to ethtool flash device
method, from Ben Hutchings.

11) S3 resume fix in via-velocity from David Lv.

12) Fix double SKB free during xmit failure in CAIF, from Dmitry
Tarnyagin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
netprio_cgroup: don't allocate prio table when a device is registered
netprio_cgroup: fix an off-by-one bug
bna: fix error handling of bnad_get_flash_partition_by_offset()
isdn: type bug in isdn_net_header()
net: Make qdisc_skb_cb upper size bound explicit.
ixgbe: ethtool: stats user buffer overrun
ixgbe: dcb: up2tc mapping lost on disable/enable CEE DCB state
ixgbe: do not update real num queues when netdev is going away
ixgbe: Fix broken dependency on MAX_SKB_FRAGS being related to page size
ixgbe: Fix case of Tx Hang in PF with 32 VFs
ixgbe: fix vf lookup
igb: fix vf lookup
e1000: add dropped DMA receive enable back in for WoL
gro: more generic L2 header check
IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
zd1211rw: firmware needs duration_id set to zero for non-pspoll frames
net: enable TC35815 for MIPS again
...

Linus Torvalds
2012-02-11 06:18:46 +0800
70620c46a net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled ... Browse Code »
1

Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed
the behavior of arp proxy to send arp replies back out on the interface
the request came in even if the private VLAN feature is disabled.

Previously we checked rt->dst.dev != skb->dev for in scenarios, when
proxy arp is enabled on for the netdevice and also when individual proxy
neighbour entries have been added.

This patch adds the check back for the pneigh_lookup() scenario.

Signed-off-by: Thomas Graf
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Thomas Graf
2012-02-11 04:13:36 +0800
5dc7883f2 ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr. ... Browse Code »
1

This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save
nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved
the nexthop of SRR in ip_option->nexthop and update iph->daddr until
we get to ip_forward_options(), but we need to update it before
ip_rt_get_source(), otherwise we may get a wrong src.

Signed-off-by: Li Wei
Signed-off-by: David S. Miller

Li Wei
2012-02-11 04:12:12 +0800
2b73bc65e netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m ... Browse Code »

When the netprio_cgroup module is not loaded, net_prio_subsys_id
is -1, and so sock_update_prioidx() accesses cgroup_subsys array
with negative index subsys[-1].

Make the code resembles cls_cgroup code, which is bug free.

Origionally-authored-by: Li Zefan
Signed-off-by: Li Zefan
Signed-off-by: Neil Horman
CC: "David S. Miller"
Signed-off-by: David S. Miller

Neil Horman
2012-02-11 04:08:57 +0800
f5c38208d netprio_cgroup: don't allocate prio table when a device is registered ... Browse Code »

So we delay the allocation till the priority is set through cgroup,
and this makes skb_update_priority() faster when it's not set.

This also eliminates an off-by-one bug similar with the one fixed
in the previous patch.

Origionally-authored-by: Li Zefan
Signed-off-by: Li Zefan
Signed-off-by: Neil Horman
CC: "David S. Miller"
Signed-off-by: David S. Miller

Neil Horman
2012-02-11 04:08:57 +0800
a87dfe14a netprio_cgroup: fix an off-by-one bug ... Browse Code »

# mount -t cgroup xxx /mnt
# mkdir /mnt/tmp
# cat /mnt/tmp/net_prio.ifpriomap
lo 0
eth0 0
virbr0 0
# echo 'lo 999' > /mnt/tmp/net_prio.ifpriomap
# cat /mnt/tmp/net_prio.ifpriomap
lo 999
eth0 0
virbr0 4101267344

We got weired output, because we exceeded the boundary of the array.
We may even crash the kernel..

Origionally-authored-by: Li Zefan
Signed-off-by: Li Zefan
Signed-off-by: Neil Horman
CC: "David S. Miller"
Signed-off-by: David S. Miller

Neil Horman
2012-02-11 04:08:56 +0800

10 Feb, 2012

2 commits

b57e6b560 mac80211: Fix a rwlock bad magic bug ... Browse Code »
1

read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path
ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig
(->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing
it.
the intilization of this read/write lock happens via the path
ieee80211_led_init (->) led_trigger_register, but we are doing
'ieee80211_led_init' after 'ieeee80211_if_add' where we
register netdev_ops.
so we access leddev_list_lock before initializing it and causes the
following bug in chrome laptops with AR928X cards with the following
script

while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done

BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc
Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1
Call Trace:

[] rwlock_bug+0x3d/0x47
[] do_raw_read_lock+0x19/0x29
[] _raw_read_lock+0xd/0xf
[] tpt_trig_timer+0xc3/0x145 [mac80211]
[] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211]
[] ieee80211_do_open+0x11e/0x42e [mac80211]
[] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211]
[] ieee80211_open+0x48/0x4c [mac80211]
[] __dev_open+0x82/0xab
[] __dev_change_flags+0x9c/0x113
[] dev_change_flags+0x18/0x44
[] devinet_ioctl+0x243/0x51a
[] inet_ioctl+0x93/0xac
[] sock_ioctl+0x1c6/0x1ea
[] ? might_fault+0x20/0x20
[] do_vfs_ioctl+0x46e/0x4a2
[] ? fget_light+0x2f/0x70
[] ? sys_recvmsg+0x3e/0x48
[] sys_ioctl+0x46/0x69
[] sysenter_do_call+0x12/0x2

Cc:
Cc: Gary Morain
Cc: Paul Stewart
Cc: Abhijit Pradhan
Cc: Vasanthakumar Thiagarajan
Cc: Rajkumar Manoharan
Acked-by: Johannes Berg
Tested-by: Mohammed Shafi Shajakhan
Signed-off-by: Mohammed Shafi Shajakhan
Signed-off-by: John W. Linville

Mohammed Shafi Shajakhan
2012-02-10 04:16:04 +0800
16bda13d9 net: Make qdisc_skb_cb upper size bound explicit. ... Browse Code »
1

Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller

David S. Miller
2012-02-10 02:50:34 +0800

09 Feb, 2012

1 commit

5ca3b72c5 gro: more generic L2 header check ... Browse Code »
1

Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.

Signed-off-by: Eric Dumazet
Reported-by: Shlomo Pongratz
Cc: Roland Dreier
Cc: Or Gerlitz
Cc: Herbert Xu
Tested-by: Sean Hefty
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-09 07:26:54 +0800

08 Feb, 2012

1 commit

6d25886ee net: Fix build regression when INET_UDP_DIAG=y and IPV6=m ... Browse Code »

Tested-by: Anisse Astier

Signed-off-by: David S. Miller

Anisse Astier
2012-02-08 02:35:28 +0800

05 Feb, 2012

2 commits

e2446eaab tcp_v4_send_reset: binding oif to iif in no sock case ... Browse Code »
1

Binding RST packet outgoing interface to incoming interface
for tcp v4 when there is no socket associate with it.
when sk is not NULL, using sk->sk_bound_dev_if instead.
(suggested by Eric Dumazet).

This has few benefits:
1. tcp_v6_send_reset already did that.
2. This helps tcp connect with SO_BINDTODEVICE set. When
connection is lost, we still able to sending out RST using
same interface.
3. we are sending reply, it is most likely to be succeed
if iif is used

Signed-off-by: Shawn Lu
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Shawn Lu
2012-02-05 07:20:05 +0800
5962b35c1 netprio_cgroup: Fix obo in get_prioidx ... Browse Code »

It was recently pointed out to me that the get_prioidx function sets a bit in
the prioidx map prior to checking to see if the index being set is out of
bounds. This patch corrects that, avoiding the possiblity of us writing beyond
the end of the array

Signed-off-by: Neil Horman
Reported-by: Stanislaw Gruszka
CC: Stanislaw Gruszka
CC: "David S. Miller"
Signed-off-by: David S. Miller

Neil Horman
2012-02-05 05:30:24 +0800

04 Feb, 2012

1 commit

157ca9eae Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless into for-davem

John W. Linville
2012-02-04 03:14:07 +0800

03 Feb, 2012

5 commits

6c073a7ee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix safety of rbd_put_client()
rbd: fix a memory leak in rbd_get_client()
ceph: create a new session lock to avoid lock inversion
ceph: fix length validation in parse_reply_info()
ceph: initialize client debugfs outside of monc->mutex
ceph: change "ceph.layout" xattr to be "ceph.file.layout"

Linus Torvalds
2012-02-03 07:47:33 +0800
ab434b60a ceph: initialize client debugfs outside of monc->mutex ... Browse Code »

Initializing debufs under monc->mutex introduces a lock dependency for
sb->s_type->i_mutex_key, which (combined with several other dependencies)
leads to an annoying lockdep warning. There's no particular reason to do
the debugfs setup under this lock, so move it out.

It used to be the case that our first monmap could come from the OSD; that
is no longer the case with recent servers, so we will reliably set up the
client entry during the initial authentication.

We don't have to worry about racing with debugfs teardown by
ceph_debugfs_client_cleanup() because ceph_destroy_client() calls
ceph_msgr_flush() first, which will wait for the message dispatch work
to complete (and the debugfs init to complete).

Fixes: #1940
Signed-off-by: Sage Weil

Sage Weil
2012-02-03 04:49:01 +0800
ba7605745 caif: Bugfix double kfree_skb upon xmit failure ... Browse Code »

SKB is freed twice upon send error. The Network stack consumes SKB even
when it returns error code.

Signed-off-by: Sjur Brændeland
Signed-off-by: David S. Miller

Dmitry Tarnyagin
2012-02-03 03:35:12 +0800
b01377a42 caif: Bugfix list_del_rcu race in cfmuxl_ctrlcmd. ... Browse Code »

Always use cfmuxl_remove_uplayer when removing a up-layer.
cfmuxl_ctrlcmd() can be called independently and in parallel with
cfmuxl_remove_uplayer(). The race between them could cause list_del_rcu
to be called on a node which has been already taken out from the list.
That lead to a (rare) crash on accessing poisoned node->prev inside
list_del_rcu.

This fix ensures that deletion are done holding the same lock.

Reported-by: Dmitry Tarnyagin
Signed-off-by: Sjur Brændeland
Signed-off-by: David S. Miller

sjur.brandeland@stericsson.com
2012-02-03 03:35:12 +0800
c43b874d5 tcp: properly initialize tcp memory limits ... Browse Code »
43

Commit 4acb4190 tries to fix the using uninitialized value
introduced by commit 3dc43e3, but it would make the
per-socket memory limits too small.

This patch fixes this and also remove the redundant codes
introduced in 4acb4190.

Signed-off-by: Jason Wang
Acked-by: Glauber Costa
Signed-off-by: David S. Miller

Jason Wang
2012-02-03 03:34:41 +0800

02 Feb, 2012

3 commits

07ae2dfcf mac80211: timeout a single frame in the rx reorder buffer ... Browse Code »
1

The current code checks for stored_mpdu_num > 1, causing
the reorder_timer to be triggered indefinitely, but the
frame is never timed-out (until the next packet is received)

Signed-off-by: Eliad Peller
Cc:
Acked-by: Johannes Berg
Signed-off-by: John W. Linville

Eliad Peller
2012-02-02 04:26:00 +0800
786f52811 ethtool: Null-terminate filename passed to ethtool_ops::flash_device ... Browse Code »

The parameters for ETHTOOL_FLASHDEV include a filename, which ought to
be null-terminated. Currently the only driver that implements
ethtool_ops::flash_device attempts to add a null terminator if
necessary, but does it wrongly. Do it in the ethtool core instead.

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2012-02-02 03:47:17 +0800
efcdbf24f net: Disambiguate kernel message ... Browse Code »

Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma
Suggested-by: Mohan Srinivasan
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller
Cc: Glauber Costa
Cc: Ingo Molnar
Cc: Joe Perches
Signed-off-by: David S. Miller

Arun Sharma
2012-02-02 03:41:50 +0800

31 Jan, 2012

5 commits

a14a8d931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

1) Setting link attributes can modify the size of the attributes that
would be reported on a subsequent getlink netlink operation,
therefore min_ifinfo_dump_size needs to be adjusted. From Stefan
Gula.

2) Resegmentation of TSO frames while trimming can violate invariants
expected by callers, namely that the number of segments can only stay
the same or decrease, never increase. If MSS changes, however, we
can trim data but then end up with more segments. Fix this by only
segmenting to the MSS already recorded in the SKB. That's the
simplest fix for now and if we want to get more fancy in the future
that's a more involved change.

This probably explains some retransmit counter inaccuracies.

From Neal Cardwell.

3) Fix too-many-wakeups in POLL with AF_UNIX sockets, from Eric Dumazet.

4) Fix CAIF crashes wrt. namespace handling. From Eric Dumazet and
Eric W. Biederman.

5) TCP port selection fixes from Flavio Leitner.

6) More socket memory cgroup build fixes in certain randonfig
situations. From Glauber Costa.

7) Fix TCP memory sysctl regression reported by Ingo Molnar, also from
Glauber Costa.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
af_unix: fix EPOLLET regression for stream sockets
tcp: fix tcp_trim_head() to adjust segment count with skb MSS
net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL
net caif: Register properly as a pernet subsystem.
netns: Fail conspicously if someone uses net_generic at an inappropriate time.
net: explicitly add jump_label.h header to sock.h
net: RTNETLINK adjusting values of min_ifinfo_dump_size
ipv6: Fix ip_gre lockless xmits.
xen-netfront: correct MAX_TX_TARGET calculation.
netns: fix net_alloc_generic()
tcp: bind() optimize port allocation
tcp: bind() fix autoselection to share ports
l2tp: l2tp_ip - fix possible oops on packet receive
iwlwifi: fix PCI-E transport "inta" race
mac80211: set bss_conf.idle when vif is connected
mac80211: update oper_channel on ibss join

Linus Torvalds
2012-01-31 02:53:20 +0800
6f01fd6e6 af_unix: fix EPOLLET regression for stream sockets ... Browse Code »

Commit 0884d7aa24 (AF_UNIX: Fix poll blocking problem when reading from
a stream socket) added a regression for epoll() in Edge Triggered mode
(EPOLLET)

Appropriate fix is to use skb_peek()/skb_unlink() instead of
skb_dequeue(), and only call skb_unlink() when skb is fully consumed.

This remove the need to requeue a partial skb into sk_receive_queue head
and the extra sk->sk_data_ready() calls that added the regression.

This is safe because once skb is given to sk_receive_queue, it is not
modified by a writer, and readers are serialized by u->readlock mutex.

This also reduce number of spinlock acquisition for small reads or
MSG_PEEK users so should improve overall performance.

Reported-by: Nick Mathewson
Signed-off-by: Eric Dumazet
Cc: Alexey Moiseytsev
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-31 01:45:07 +0800
5b35e1e6e tcp: fix tcp_trim_head() to adjust segment count with skb MSS ... Browse Code »
1

This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.

Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().

As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.

Signed-off-by: Neal Cardwell
Acked-by: Nandita Dukkipati
Acked-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Neal Cardwell
2012-01-31 01:42:58 +0800
4acb41903 net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL ... Browse Code »
129

sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
in commit 3dc43e3e4d0b52197d3205214fe8f162f9e0c334, since it
became a per-ns value.

That code, however, will never run when CONFIG_SYSCTL is
disabled, leading to bogus values on those fields - causing hung
TCP sockets.

This patch fixes it by keeping an initialization code in
tcp_init(). It will be overwritten by the first net namespace
init if CONFIG_SYSCTL is compiled in, and do the right thing if
it is compiled out.

It is also named properly as tcp_init_mem(), to properly signal
its non-sysctl side effect on TCP limits.

Reported-by: Ingo Molnar
Signed-off-by: Glauber Costa
Cc: David S. Miller
Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
[ renamed the function, tidied up the changelog a bit ]
Signed-off-by: Ingo Molnar
Signed-off-by: David S. Miller

Glauber Costa
2012-01-31 01:41:06 +0800
f94f72ee6 Merge tag 'nfs-for-3.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

NFS client bugfixes for Linux 3.3 (pull 3)

* tag 'nfs-for-3.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix machine creds in generic_create_cred and generic_match

Linus Torvalds
2012-01-31 00:47:49 +0800

28 Jan, 2012

2 commits

8a8ee9aff net caif: Register properly as a pernet subsystem. ... Browse Code »
1

caif is a subsystem and as such it needs to register with
register_pernet_subsys instead of register_pernet_device.

Among other problems using register_pernet_device was resulting in
net_generic being called before the caif_net structure was allocated.
Which has been causing net_generic to fail with either BUG_ON's or by
return NULL pointers.

A more ugly problem that could be caused is packets in flight why the
subsystem is shutting down.

To remove confusion also remove the cruft cause by inappropriately
trying to fix this bug.

With the aid of the previous patch I have tested this patch and
confirmed that using register_pernet_subsys makes the failure go away as
it should.

Signed-off-by: Eric W. Biederman
Acked-by: Sjur Brændeland
Tested-by: Sasha Levin
Signed-off-by: David S. Miller

Eric W. Biederman
2012-01-28 10:06:03 +0800
cc0d7b91d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Browse Code »

David S. Miller
2012-01-28 09:40:18 +0800

27 Jan, 2012

3 commits

f18da1456 net: RTNETLINK adjusting values of min_ifinfo_dump_size ... Browse Code »

Setting link parameters on a netdevice changes the value
of if_nlmsg_size(), therefore it is necessary to recalculate
min_ifinfo_dump_size.

Signed-off-by: Stefan Gula
Signed-off-by: David S. Miller

Stefan Gula
2012-01-27 05:35:57 +0800
f2b3ee9e4 ipv6: Fix ip_gre lockless xmits. ... Browse Code »

Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and
ipip set this unconditionally in ops->setup, but gre enables it
conditionally after parameter passing in ops->newlink. This is
not called during tunnel setup as below, however, so GRE tunnels are
still taking the lock.

modprobe ip_gre
ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
ip link set test0 up
ip addr add 10.6.0.1 dev test0
# cat /sys/class/net/test0/features
# $DIR/test_tunnel_xmit 10 10.5.2.1
ip route add 10.5.2.0/24 dev test0
ip tunnel del test0

The newlink callback is only called in rtnl_netlink, and only if
the device is new, as it calls register_netdevice internally. Gre
tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
which calls ipgre_tunnel_locate, which calls register_netdev.
rtnl_newlink is called at 'ip link set', but skips ops->newlink
and the device is up with locking still enabled. The equivalent
ipip tunnel works fine, btw (just substitute 'method gre' for
'method ipip').

On kernels before /sys/class/net/*/features was removed [1],
the first commented out line returns 0x6000 with method gre,
which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
it reports 0x7000. This test cannot be used on recent kernels where
the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
work for tunnel devices, because they lack dev->ethtool_ops).

The second commented out line calls a simple transmission test [2]
that sends on 24 cores at maximum rate. Results of a single run:

ipip: 19,372,306
gre before patch: 4,839,753
gre after patch: 19,133,873

This patch replicates the condition check in ipgre_newlink to
ipgre_tunnel_locate. It works for me, both with oseq on and off.
This is the first time I looked at rtnetlink and iproute2 code,
though, so someone more knowledgeable should probably check the
patch. Thanks.

The tail of both functions is now identical, by the way. To avoid
code duplication, I'll be happy to rework this and merge the two.

[1] http://patchwork.ozlabs.org/patch/104610/
[2] http://kernel.googlecode.com/files/xmit_udp_parallel.c

Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Willem de Bruijn
2012-01-27 05:34:08 +0800
073862ba5 netns: fix net_alloc_generic() ... Browse Code »
1

When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :

[ 200.752016] kernel BUG at include/net/netns/generic.h:40!
...
[ 200.752016] [] ? get_cfcnfg+0x3a/0x180
[ 200.752016] [] ? lockdep_rtnl_is_held+0x10/0x20
[ 200.752016] [] caif_device_notify+0x2e/0x530
[ 200.752016] [] notifier_call_chain+0x67/0x110
[ 200.752016] [] raw_notifier_call_chain+0x11/0x20
[ 200.752016] [] call_netdevice_notifiers+0x32/0x60
[ 200.752016] [] register_netdevice+0x196/0x300
[ 200.752016] [] register_netdev+0x19/0x30
[ 200.752016] [] loopback_net_init+0x4a/0xa0
[ 200.752016] [] ops_init+0x42/0x180
[ 200.752016] [] setup_net+0x6b/0x100
[ 200.752016] [] copy_net_ns+0x86/0x110
[ 200.752016] [] create_new_namespaces+0xd9/0x190

net_alloc_generic() should take into account the maximum index into the
ptr array, as a subsystem might use net_generic() anytime.

This also reduces number of reallocations in net_assign_generic()

Reported-by: Sasha Levin
Tested-by: Sasha Levin
Signed-off-by: Eric Dumazet
Cc: Sjur Brændeland
Cc: Eric W. Biederman
Cc: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-27 02:36:19 +0800

26 Jan, 2012

1 commit

fddb7b576 tcp: bind() optimize port allocation ... Browse Code »

Port autoselection finds a port and then drop the lock,
then right after that, gets the hash bucket again and lock it.

Fix it to go direct.

Signed-off-by: Flavio Leitner
Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Flavio Leitner
2012-01-26 10:50:43 +0800