Eric Lee / smarc-fsl-linux-kernel

21 Jul, 2013

1 commit

3be542d46 Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
- Fix a regression against NFSv4 FreeBSD servers when creating a new
file
- Fix another regression in rpc_client_register()

* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix a regression against the FreeBSD server
SUNRPC: Fix another issue with rpc_client_register()

Linus Torvalds
2013-07-21 01:48:24 +0800

19 Jul, 2013

5 commits

ecb2cf1a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"A couple interesting SKB fragment handling fixes, plus the usual small
bits here and there:

1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
Tim Gardner.

2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
address printing helper.

3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
device can't do checksumming offloads then it shouldn't say it can
do SG either. From Haiyang Zhang.

4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.

5) Don't leak DMA mappings on mapping failures, from Neil Horman.

6) We need to reset the transport header of SKBs in ipv4 before we
attempt to perform early socket demux, just like ipv6 does. From
Eric Dumazet.

7) Add missing locking on vxlan device removal, from Stephen
Hemminger.

8) xen-netfront has to make two passes over an SKB to prepare it for
transfer. One pass calculates the number of slots needed, the
second massages the SKB and fills the slots. Unfortunately, the
first pass doesn't calculate the number of slots properly so we
can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
work out so well. Fix from Jan Beulich with help and discussion
with several others.

9) Fix a similar problem in tun and macvtap, which have to split up
scatter-gather elements at PAGE_SIZE boundaries. Don't do
zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from
Jason Wang.

10) On receive, once we've decoded the VLAN state completely, clear
skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger
the VLAN code again, corrupting the packet. Fix from Eric
Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
vlan: fix a race in egress prio management
vlan: mask vlan prio bits
macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
pkt_sched: sch_qfq: remove a source of high packet delay/jitter
xen-netfront: pull on receive skb may need to happen earlier
vxlan: add necessary locking on device removal
hyperv: Fix the NETIF_F_SG flag setting in netvsc
net: Fix sysfs_format_mac() code duplication.
be2net: Fix to avoid hardware workaround when not needed
macvtap: do not assume 802.1Q when send vlan packets
macvtap: fix the missing ret value of TUNSETQUEUE
ipv4: set transport header earlier
mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
bgmac: add dependency to phylib
net/irda: fixed style issues in irlan_eth
ethtool: fixed trailing statements in ethtool
ndisc: bool initializations should use true and false
atl1e: unmap partially mapped skb on dma error and free skb

Linus Torvalds
2013-07-19 11:08:47 +0800
3e3aac497 vlan: fix a race in egress prio management ... Browse Code »

egress_priority_map[] hash table updates are protected by rtnl,
and we never remove elements until device is dismantled.

We have to make sure that before inserting an new element in hash table,
all its fields are committed to memory or else another cpu could
find corrupt values and crash.

Signed-off-by: Eric Dumazet
Cc: Patrick McHardy
Signed-off-by: David S. Miller

Eric Dumazet
2013-07-19 04:07:13 +0800
d4b812dea vlan: mask vlan prio bits ... Browse Code »

In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.

But we also have a problem if prio bits are set.

Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.

Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :

16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

It seems we are not really ready to properly cope with this right now.

We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.

For stable kernels, lets clear vlan_tci to fix the bugs.

Reported-by: Steinar H. Gunderson
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-07-19 04:05:23 +0800
87f40dd6c pkt_sched: sch_qfq: remove a source of high packet delay/jitter ... Browse Code »

QFQ+ inherits from QFQ a design choice that may cause a high packet
delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
special quantity, the system virtual time, to track the service
provided by the ideal system it approximates. When a packet is
dequeued, this quantity must be incremented by the size of the packet,
divided by the sum of the weights of the aggregates waiting to be
served. Tracking this sum correctly is a non-trivial task, because, to
preserve tight service guarantees, the decrement of this sum must be
delayed in a special way [1]: this sum can be decremented only after
that its value would decrease also in the ideal system approximated by
QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
weight sum, increased and decreased immediately as the weight of an
aggregate changes, and as an aggregate is created or destroyed (which,
in its turn, happens as a consequence of some class being
created/destroyed/changed). However, to avoid the problems caused to
service guarantees by these immediate decreases, QFQ+ increments the
system virtual time using the maximum value allowed for the weight
sum, 2^10, in place of the dynamic, instantaneous value. The
instantaneous value of the weight sum is used only to check whether a
request of weight increase or a class creation can be satisfied.

Unfortunately, the problems caused by this choice are worse than the
temporary degradation of the service guarantees that may occur, when a
class is changed or destroyed, if the instantaneous value of the
weight sum was used to update the system virtual time. In fact, the
fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
equal to the ratio between the weight of the aggregate and the sum of
the weights of the competing aggregates. The packet delay guaranteed
to the aggregate is instead inversely proportional to the guaranteed
bandwidth. By using the maximum possible value, and not the actual
value of the weight sum, QFQ+ provides each aggregate with the worst
possible service guarantees, and not with service guarantees related
to the actual set of competing aggregates. To see the consequences of
this fact, consider the following simple example.

Suppose that only the following aggregates are backlogged, i.e., that
only the classes in the following aggregates have packets to transmit:
one aggregate with weight 10, say A, and ten aggregates with weight 1,
say B1, B2, ..., B10. In particular, suppose that these aggregates are
always backlogged. Given the weight distribution, the smoothest and
fairest service order would be:
A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...

QFQ+ would provide exactly this optimal service if it used the actual
value for the weight sum instead of the maximum possible value, i.e.,
11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
serves aggregates as follows (easy to prove and to reproduce
experimentally):
A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...

By replacing 10 with N in the above example, and by increasing N, one
can increase at will the maximum packet delay and the jitter
experienced by the classes in aggregate A.

This patch addresses this issue by just using the above
'instantaneous' value of the weight sum, instead of the maximum
possible value, when updating the system virtual time. After the
instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
service for a time interval in the order of the time to serve one
maximum-size packet for each backlogged class. The worst-case extent
of the deviation exhibited by QFQ+ during this time interval [1] is
basically the same as of the deviation described above (but, without
this patch, QFQ+ suffers from such a deviation all the time). Finally,
this patch modifies the comment to the function qfq_slot_insert, to
make it coherent with the fact that the weight sum used by QFQ+ can
now be lower than the maximum possible value.

[1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
Proceedings of AAA-IDEA'05, June 2005.

Signed-off-by: Paolo Valente
Signed-off-by: David S. Miller

Paolo Valente
2013-07-19 04:02:00 +0800
3f334c208 Merge branch 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

Pull phase two of __cpuinit removal from Paul Gortmaker:
"With the __cpuinit infrastructure removed earlier, this group of
commits only removes the function/data tagging that was done with the
various (now no-op) __cpuinit related prefixes.

Now that the dust has settled with yesterday's v3.11-rc1, there
hopefully shouldn't be any new users leaking back in tree, but I think
we can leave the harmless no-op stubs there for a release as a
courtesy to those who still have out of tree stuff and weren't paying
attention.

Although the commits are against the recent tag to allow for minor
context refreshes for things like yesterday's v3.11-rc1~ slab content,
the patches have been largely unchanged for weeks, aside from such
trivial updates.

For detail junkies, the largely boring and mostly irrelevant history
of the patches can be viewed at:

http://git.kernel.org/cgit/linux/kernel/git/paulg/cpuinit-delete.git

If nothing else, I guess it does at least demonstrate the level of
involvement required to shepherd such a treewide change to completion.

This is the same repository of patches that has been applied to the
end of the daily linux-next branches for the past several weeks"

* 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits)
block: delete __cpuinit usage from all block files
drivers: delete __cpuinit usage from all remaining drivers files
kernel: delete __cpuinit usage from all core kernel files
rcu: delete __cpuinit usage from all rcu files
net: delete __cpuinit usage from all net files
acpi: delete __cpuinit usage from all acpi files
hwmon: delete __cpuinit usage from all hwmon files
cpufreq: delete __cpuinit usage from all cpufreq files
clocksource+irqchip: delete __cpuinit usage from all related files
x86: delete __cpuinit usage from all x86 files
score: delete __cpuinit usage from all score files
xtensa: delete __cpuinit usage from all xtensa files
openrisc: delete __cpuinit usage from all openrisc files
m32r: delete __cpuinit usage from all m32r files
hexagon: delete __cpuinit usage from all hexagon files
frv: delete __cpuinit usage from all frv files
cris: delete __cpuinit usage from all cris files
metag: delete __cpuinit usage from all metag files
tile: delete __cpuinit usage from all tile files
sh: delete __cpuinit usage from all sh files
...

Linus Torvalds
2013-07-19 01:50:26 +0800

18 Jul, 2013

1 commit

61f98b0fc Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd bugfixes from Bruce Fields:
"Just three minor bugfixes"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
svcrdma: underflow issue in decode_write_list()
nfsd4: fix minorversion support interface
lockd: protect nlm_blocked access in nlmsvc_retry_blocked

Linus Torvalds
2013-07-18 04:43:55 +0800

17 Jul, 2013

5 commits

ae8e9c5a1 net: Fix sysfs_format_mac() code duplication. ... Browse Code »

It's just a duplicate implementation of "%*phC". Thanks to Joe
Perches for showing that we had exactly this support in the
lib/vsprintf.c code already.

Signed-off-by: David S. Miller

David S. Miller
2013-07-17 08:09:22 +0800
21d1196a3 ipv4: set transport header earlier ... Browse Code »

commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
performance regression for non GRO traffic, basically disabling
IP early demux.

IPv6 stack resets transport header in ip6_rcv() before calling
IP early demux in ip6_rcv_finish(), while IPv4 does this only in
ip_local_deliver_finish(), _after_ IP early demux.

GRO traffic happened to enable IP early demux because transport header
is also set in inet_gro_receive()

Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
same : transport_header should be set in ip_rcv() instead of
ip_local_deliver_finish()

ip_local_deliver_finish() can also use skb_network_header_len() which is
faster than ip_hdrlen()

Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2013-07-17 03:59:28 +0800
b6a82dd23 net/irda: fixed style issues in irlan_eth ... Browse Code »

Applied fixes suggested by checkpatch.pl

Signed-off-by: Dragos Foianu
Signed-off-by: David S. Miller

Dragos Foianu
2013-07-17 03:16:03 +0800
8a73125c3 ethtool: fixed trailing statements in ethtool ... Browse Code »

Applied fixes suggested by checkpatch.pl.

Signed-off-by: Dragos Foianu
Signed-off-by: David S. Miller

Dragos Foianu
2013-07-17 03:14:51 +0800
f2f79cca1 ndisc: bool initializations should use true and false ... Browse Code »

Signed-off-by: Daniel Baluta
Signed-off-by: David S. Miller

Daniel Baluta
2013-07-17 03:13:31 +0800

15 Jul, 2013

4 commits

b2781e102 svcrdma: underflow issue in decode_write_list() ... Browse Code »

My static checker marks everything from ntohl() as untrusted and it
complains we could have an underflow problem doing:

return (u32 *)&ary->wc_array[nchunks];

Also on 32 bit systems the upper bound check could overflow.

Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter
Signed-off-by: J. Bruce Fields

Dan Carpenter
2013-07-15 23:46:23 +0800
1540c5d3c SUNRPC: Fix another issue with rpc_client_register() ... Browse Code »

Fix the error pathway if rpcauth_create() fails.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-07-15 22:15:54 +0800
013dbb325 net: delete __cpuinit usage from all net files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the net/* uses of the __cpuinit macros
from all C files.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:58 +0800
41d9884c4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs stuff from Al Viro:
"O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
making simple_lookup() usable for filesystems with non-NULL s_d_op,
which allows us to get rid of quite a bit of ugliness"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
sunrpc: now we can just set ->s_d_op
cgroup: we can use simple_lookup() now
efivarfs: we can use simple_lookup() now
make simple_lookup() usable for filesystems that set ->s_d_op
configfs: don't open-code d_alloc_name()
__rpc_lookup_create_exclusive: pass string instead of qstr
rpc_create_*_dir: don't bother with qstr
llist: llist_add() can use llist_add_batch()
llist: fix/simplify llist_add() and llist_add_batch()
fput: turn "list_head delayed_fput_list" into llist_head
fs/file_table.c:fput(): add comment
Safer ABI for O_TMPFILE

Linus Torvalds
2013-07-15 02:42:26 +0800

14 Jul, 2013

4 commits

dae3794fd sunrpc: now we can just set ->s_d_op ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-07-14 21:55:39 +0800
d3db90b0a __rpc_lookup_create_exclusive: pass string instead of qstr ... Browse Code »

... and use d_hash_and_lookup() instead of open-coding it, for fsck sake...

Signed-off-by: Al Viro

Al Viro
2013-07-14 21:09:57 +0800
a95e691f9 rpc_create_*_dir: don't bother with qstr ... Browse Code »

just pass the name

Signed-off-by: Al Viro

Al Viro
2013-07-14 21:02:28 +0800
be9c6d916 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Just a bunch of small fixes and tidy ups:

1) Finish the "busy_poll" renames, from Eliezer Tamir.

2) Fix RCU stalls in IFB driver, from Ding Tianhong.

3) Linearize buffers properly in tun/macvtap zerocopy code.

4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

5) Spinlock used before init in alx driver, from Maarten Lankhorst.

6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
Kravkov.

7) Dummy and ifb driver load failure paths can oops, fixes from Tan
Xiaojun and Ding Tianhong.

8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

9) Account all TCP retransmits in SNMP stats properly, from Yuchung
Cheng.

10) atl1e and via-rhine do not handle DMA mapping failures properly,
from Neil Horman.

11) Various equal-cost multipath route fixes in ipv6 from Hannes
Frederic Sowa"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
ipv6: only static routes qualify for equal cost multipathing
via-rhine: fix dma mapping errors
atl1e: fix dma mapping warnings
tcp: account all retransmit failures
usb/net/r815x: fix cast to restricted __le32
usb/net/r8152: fix integer overflow in expression
net: access page->private by using page_private
net: strict_strtoul is obsolete, use kstrtoul instead
drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
net/usb: add relative mii functions for r815x
net/tipc: use %*phC to dump small buffers in hex form
qlcnic: Adding Maintainers.
gre: Fix MTU sizing check for gretap tunnels
pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
pkt_sched: sch_qfq: improve efficiency of make_eligible
gso: Update tunnel segmentation to support Tx checksum offload
inet: fix spacing in assignment
ifb: fix oops when loading the ifb failed
...

Linus Torvalds
2013-07-14 08:42:22 +0800

13 Jul, 2013

4 commits

307f2fb95 ipv6: only static routes qualify for equal cost multipathing ... Browse Code »

Static routes in this case are non-expiring routes which did not get
configured by autoconf or by icmpv6 redirects.

To make sure we actually get an ecmp route while searching for the first
one in this fib6_node's leafs, also make sure it matches the ecmp route
assumptions.

v2:
a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
already ensures that this route, even if added again without
RTF_EXPIRES (in case of a RA announcement with infinite timeout),
does not cause the rt6i_nsiblings logic to go wrong if a later RA
updates the expiration time later.

v3:
a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
because an pmtu event could update the RTF_EXPIRES flag and we would
not count this route, if another route joins this set. We now filter
only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
don't get changed after rt6_info construction.

Cc: Nicolas Dichtel
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-07-13 07:29:54 +0800
24ab6bec8 tcp: account all retransmit failures ... Browse Code »

Change snmp RETRANSFAILS stat to include timeout retransmit failures
in addition to other loss recoveries.

Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
2013-07-13 07:15:56 +0800
40dadff26 net: access page->private by using page_private ... Browse Code »

Signed-off-by: Sunghan Suh
Signed-off-by: David S. Miller

Sunghan Suh
2013-07-13 07:10:34 +0800
92338dc2f net: strict_strtoul is obsolete, use kstrtoul instead ... Browse Code »

patch found using checkpatch.pl

Signed-off-by: Cosmin Stanescu
Signed-off-by: David S. Miller

“Cosmin
2013-07-13 07:09:14 +0800

12 Jul, 2013

11 commits

d77e41e12 net/tipc: use %*phC to dump small buffers in hex form ... Browse Code »

Instead of passing each byte by stack let's use nice specifier for that.

Signed-off-by: Andy Shevchenko
Signed-off-by: David S. Miller

Andy Shevchenko
2013-07-12 08:03:36 +0800
8c91e162e gre: Fix MTU sizing check for gretap tunnels ... Browse Code »

This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
packets are sent from the interface.

In my case I was able to reproduce the issue by simply sending a ping of
1421 bytes with the gretap interface created on a device with a standard
1500 mtu.

This fix is based on the fact that the tunnel mtu is already adjusted by
dev->hard_header_len so it would make sense that any packets being compared
against that mtu should also be adjusted by hard_header_len and the tunnel
header instead of just the tunnel header.

Signed-off-by: Alexander Duyck
Reported-by: Cong Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Alexander Duyck
2013-07-12 07:12:03 +0800
88d4f419a pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts ... Browse Code »

This patch removes the forward declaration of qfq_update_agg_ts, by moving
the definition of the function above its first call. This patch also
removes a useless forward declaration of qfq_schedule_agg.

Reported-by: David S. Miller
Signed-off-by: Paolo Valente
Signed-off-by: David S. Miller

Paolo Valente
2013-07-12 04:01:07 +0800
87f1369d6 pkt_sched: sch_qfq: improve efficiency of make_eligible ... Browse Code »

In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result. The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31. On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.

The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32. As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

Reported-by: David S. Miller
Reported-by: David Laight
Signed-off-by: Paolo Valente
Signed-off-by: David S. Miller

Paolo Valente
2013-07-12 04:01:07 +0800
cdbaa0bb2 gso: Update tunnel segmentation to support Tx checksum offload ... Browse Code »

This change makes it so that the GRE and VXLAN tunnels can make use of Tx
checksum offload support provided by some drivers via the hw_enc_features.
Without this fix enabling GSO means sacrificing Tx checksum offload and
this actually leads to a performance regression as shown below:

Utilization
Send
Throughput local GSO
10^6bits/s % S state
6276.51 8.39 enabled
7123.52 8.42 disabled

To resolve this it was necessary to address two items. First
netif_skb_features needed to be updated so that it would correctly handle
the Trans Ether Bridging protocol without impacting the need to check for
Q-in-Q tagging. To do this it was necessary to update harmonize_features
so that it used skb_network_protocol instead of just using the outer
protocol.

Second it was necessary to update the GRE and UDP tunnel segmentation
offloads so that they would reset the encapsulation bit and inner header
offsets after the offload was complete.

As a result of this change I have seen the following results on a interface
with Tx checksum enabled for encapsulated frames:

Utilization
Send
Throughput local GSO
10^6bits/s % S state
7123.52 8.42 disabled
8321.75 5.43 enabled

v2: Instead of replacing refrence to skb->protocol with
skb_network_protocol just replace the protocol reference in
harmonize_features to allow for double VLAN tag checks.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2013-07-12 03:18:49 +0800
1466b77a7 Merge tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull second set of NFS client updates from Trond Myklebust:
"This mainly contains some small readdir optimisations that had
dependencies on Al Viro's readdir rewrite. There is also a fix for a
nasty deadlock which surfaced earlier in this merge window.

Highlights include:
- Fix an_rpc pipefs regression that causes a deadlock on mount
- Readdir optimisations by Scott Mayhew and Jeff Layton
- clean up the rpc_pipefs dentry operation setup"

* tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix a deadlock in rpc_client_register()
rpc_pipe: rpc_dir_inode_operations can be static
NFS: Allow nfs_updatepage to extend a write under additional circumstances
NFS: Make nfs_readdir revalidate less often
NFS: Make nfs_attribute_cache_expired() non-static
rpc_pipe: set dentry operations at d_alloc time
nfs: set verifier on existing dentries in nfs_prime_dcache

Linus Torvalds
2013-07-12 03:11:35 +0800
3b8ccd447 inet: fix spacing in assignment ... Browse Code »

Found using checkpatch.pl

Signed-off-by: Camelia Groza
Signed-off-by: David S. Miller

Camelia Groza
2013-07-12 03:02:39 +0800
afc154e97 ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF ... Browse Code »

This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52
("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
information are available").

Since the removal of rt->n in rt6_info we can end up with a dst ==
NULL in rt6_check_neigh. In case the kernel is not compiled with
CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
NUD state but we must not avoid doing round robin selection on routes
with the same target. So introduce and pass down a boolean ``do_rr'' to
indicate when we should update rt->rr_ptr. As soon as no route is valid
we do backtracking and do a lookup on a higher level in the fib trie.

v2:
a) Improved rt6_check_neigh logic (no need to create neighbour there)
and documented return values.

v3:
a) Introduce enum rt6_nud_state to get rid of the magic numbers
(thanks to David Miller).
b) Update and shorten commit message a bit to actualy reflect
the source.

Reported-by: Pierre Emeriaud
Cc: YOSHIFUJI Hideaki
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-07-12 02:51:10 +0800
110ecd69a 9p: fix off by one causing access violations and memory corruption ... Browse Code »

p9_release_pages() would attempt to dereference one value past the end of
pages[]. This would cause the following crashes:

[ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000
[ 6293.174146] IP: [] p9_release_pages+0x3b/0x60
[ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060
[ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 6293.180060] Modules linked in:
[ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G W 3.10.0-next-20130710-sasha #3954
[ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000
[ 6293.180060] RIP: 0010:[] [] p9_release_pages+0x3b/0x60
[ 6293.214316] RSP: 0000:ffff880787ddfc28 EFLAGS: 00010202
[ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000
[ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40
[ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000
[ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070
[ 6293.222017] FS: 00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000
[ 6293.256784] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0
[ 6293.256784] Stack:
[ 6293.256784] ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
[ 6293.256784] ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
[ 6293.256784] ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
[ 6293.256784] Call Trace:
[ 6293.256784] [] p9_virtio_zc_request+0x598/0x630
[ 6293.256784] [] ? wake_up_bit+0x40/0x40
[ 6293.256784] [] p9_client_zc_rpc+0x111/0x3a0
[ 6293.256784] [] ? sched_clock_cpu+0x108/0x120
[ 6293.256784] [] p9_client_read+0xe1/0x2c0
[ 6293.256784] [] v9fs_file_read+0x90/0xc0
[ 6293.256784] [] vfs_read+0xc3/0x130
[ 6293.256784] [] ? trace_hardirqs_on+0xd/0x10
[ 6293.256784] [] SyS_read+0x62/0xa0
[ 6293.256784] [] tracesys+0xdd/0xe2
[ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
[ 6293.256784] RIP [] p9_release_pages+0x3b/0x60
[ 6293.256784] RSP
[ 6293.256784] CR2: ffff8807c96f3000
[ 6293.256784] ---[ end trace 50822ee72cd360fc ]---

Signed-off-by: Sasha Levin
Signed-off-by: David S. Miller

Sasha Levin
2013-07-12 02:36:02 +0800
19d2f8e0f Merge tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/l… ... Browse Code »

…inux/kernel/git/ericvh/v9fs

Pull second round of 9p patches from Eric Van Hensbergen:
"Several of these patches were rebased in order to correct style
issues. Only stylistic changes were made versus the patches which
were in linux-next for two weeks. The rebases have been in linux-next
for 3 days and have passed my regressions.

The bulk of these are RDMA fixes and improvements. There's also some
additions on the extended attributes front to support some additional
namespaces and a new option for TCP to force allocation of mount
requests from a priviledged port"

* tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
9P: Add cancelled() to the transport functions.
9P/RDMA: count posted buffers without a pending request
9P/RDMA: Improve error handling in rdma_request
9P/RDMA: Do not free req->rc in error handling in rdma_request()
9P/RDMA: Use a semaphore to protect the RQ
9P/RDMA: Protect against duplicate replies
9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
9pnet: refactor struct p9_fcall alloc code
9P/RDMA: rdma_request() needs not allocate req->rc
9P: Fix fcall allocation for rdma
fs/9p: xattr: add trusted and security namespaces
net/9p: add privport option to 9p tcp transport

Linus Torvalds
2013-07-12 01:21:23 +0800
0ff08ba5d Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd changes from Bruce Fields:
"Changes this time include:

- 4.1 enabled on the server by default: the last 4.1-specific issues
I know of are fixed, so we're not going to find the rest of the
bugs without more exposure.
- Experimental support for NFSv4.2 MAC Labeling (to allow running
selinux over NFS), from Dave Quigley.
- Fixes for some delicate cache/upcall races that could cause rare
server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
debugging persistence.
- Fixes for some bugs found at the recent NFS bakeathon, mostly v4
and v4.1-specific, but also a generic bug handling fragmented rpc
calls"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: support minorversion 1 by default
nfsd4: allow destroy_session over destroyed session
svcrpc: fix failures to handle -1 uid's
sunrpc: Don't schedule an upcall on a replaced cache entry.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: remove races with queuing an upcall.
nfsd4: return delegation immediately if lease fails
nfsd4: do not throw away 4.1 lock state on last unlock
nfsd4: delegation-based open reclaims should bypass permissions
svcrpc: don't error out on small tcp fragment
svcrpc: fix handling of too-short rpc's
nfsd4: minor read_buf cleanup
nfsd4: fix decoding of compounds across page boundaries
nfsd4: clean up nfs4_open_delegation
NFSD: Don't give out read delegations on creates
nfsd4: allow client to send no cb_sec flavors
nfsd4: fail attempts to request gss on the backchannel
nfsd4: implement minimal SP4_MACH_CRED
...

Linus Torvalds
2013-07-12 01:17:13 +0800

11 Jul, 2013

5 commits

1eb4f7582 ipv6: in case of link failure remove route directly instead of letting it expire ... Browse Code »

We could end up expiring a route which is part of an ecmp route set. Doing
so would invalidate the rt->rt6i_nsiblings calculations and could provoke
the following panic:

[ 80.144667] ------------[ cut here ]------------
[ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
[ 80.145172] invalid opcode: 0000 [#1] SMP
[ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
+snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
[ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
[ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
[ 80.145172] RIP: 0010:[] [] fib6_add+0x75d/0x830
[ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202
[ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
[ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
[ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
[ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
[ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
[ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
[ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
[ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 80.145172] Stack:
[ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
[ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
[ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
[ 80.145172] Call Trace:
[ 80.145172] [] ? rt6_bind_peer+0x4b/0x90
[ 80.145172] [] __ip6_ins_rt+0x45/0x70
[ 80.145172] [] ip6_ins_rt+0x35/0x40
[ 80.145172] [] ip6_pol_route.isra.44+0x3a4/0x4b0
[ 80.145172] [] ip6_pol_route_output+0x2a/0x30
[ 80.145172] [] fib6_rule_action+0xd7/0x210
[ 80.145172] [] ? ip6_pol_route_input+0x30/0x30
[ 80.145172] [] fib_rules_lookup+0xc6/0x140
[ 80.145172] [] fib6_rule_lookup+0x44/0x80
[ 80.145172] [] ? ip6_pol_route_input+0x30/0x30
[ 80.145172] [] ip6_route_output+0x73/0xb0
[ 80.145172] [] ip6_dst_lookup_tail+0x2c3/0x2e0
[ 80.145172] [] ? list_del+0x11/0x40
[ 80.145172] [] ? remove_wait_queue+0x3c/0x50
[ 80.145172] [] ip6_dst_lookup_flow+0x3d/0xa0
[ 80.145172] [] rawv6_sendmsg+0x267/0xc20
[ 80.145172] [] inet_sendmsg+0x63/0xb0
[ 80.145172] [] ? selinux_socket_sendmsg+0x23/0x30
[ 80.145172] [] sock_sendmsg+0xa6/0xd0
[ 80.145172] [] SYSC_sendto+0x128/0x180
[ 80.145172] [] ? update_curr+0xec/0x170
[ 80.145172] [] ? kvm_clock_get_cycles+0x9/0x10
[ 80.145172] [] ? __getnstimeofday+0x3e/0xd0
[ 80.145172] [] SyS_sendto+0xe/0x10
[ 80.145172] [] system_call_fastpath+0x16/0x1b
[ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
[ 80.145172] RIP [] fib6_add+0x75d/0x830
[ 80.145172] RSP
[ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
[ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt

Cc: Nicolas Dichtel
Cc: YOSHIFUJI Hideaki
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-07-11 10:45:39 +0800
64b0dc517 net: rename busy poll socket op and globals ... Browse Code »

Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.

a patch for the socket.7 man page will follow separately,
because of limitations of my mail setup.

Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller

Eliezer Tamir
2013-07-11 08:08:27 +0800
8b80cda53 net: rename ll methods to busy-poll ... Browse Code »

Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines in include/net/busy_poll.h

Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller

Eliezer Tamir
2013-07-11 08:08:27 +0800
076bb0c82 net: rename include/net/ll_poll.h to include/net/busy_poll.h ... Browse Code »

Rename the file and correct all the places where it is included.

Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller

Eliezer Tamir
2013-07-11 08:08:27 +0800
eeee24526 SUNRPC: Fix a deadlock in rpc_client_register() ... Browse Code »

Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on
PipeFS MOUNT notifications) introduces a regression when we call
rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.

By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
end up deadlocking in gss_pipes_dentries_create_net().
Fix is to register the client and release the mutex before calling
rpcauth_create().

Reported-by: Weston Andros Adamson
Tested-by: Weston Andros Adamson
Cc: Stanislav Kinsbursky
Cc: # : 3848160: SUNRPC: fix races on PipeFS MOUNT
Cc: # : e73f4cc: SUNRPC: split client creation
Signed-off-by: Trond Myklebust

Trond Myklebust
2013-07-11 03:58:55 +0800