Doug / smarc-fsl-linux-kernel | Embedian Git Server

03 Aug, 2010

2 commits

3578b0c8a Revert "net: remove zap_completion_queue" ... Browse Code »

This reverts commit 15e83ed78864d0625e87a85f09b297c0919a4797.

As explained by Johannes Berg, the optimization made here is
invalid. Or, at best, incomplete.

Not only destructor invocation, but conntract entry releasing
must be executed outside of hw IRQ context.

So just checking "skb->destructor" is insufficient.

Signed-off-by: David S. Miller

David S. Miller
2010-08-03 15:24:04 +0800
a427615e0 net: cleanup inclusion ... Browse Code »

Commit ab95bfe01f9872459c8678572ccadbf646badad0 replaces bridge and macvlan
hooks in __netif_receive_skb(), so dev.c doesn't need to include their headers.

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-08-03 13:45:49 +0800

01 Aug, 2010

1 commit

de3848301 net: ingress filter message limit ... Browse Code »

If user misconfigures ingress and causes a redirection loop, don't
overwhelm the log. This is also a error case so make it unlikely.
Found by inspection, luckily not in real system.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2010-08-01 15:33:23 +0800

28 Jul, 2010

1 commit

bb7e95c8f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bnx2x_main.c

Merge bnx2x bug fixes in by hand... :-/

Signed-off-by: David S. Miller

David S. Miller
2010-07-28 12:01:35 +0800

27 Jul, 2010

1 commit

a256be70c drop_monitor: use genl_register_family_with_ops() ... Browse Code »

[ Fix unused local variable build warnings. -DaveM ]

Signed-off-by: Changli Gao
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Changli Gao
2010-07-27 11:59:42 +0800

26 Jul, 2010

1 commit

c736eefad net: dev_forward_skb should call nf_reset ... Browse Code »

With conn-track zones and probably with different network
namespaces, the netfilter logic needs to be re-calculated
on packet receive. If the netfilter logic is not reset,
it will not be recalculated properly. This patch adds
the nf_reset logic to dev_forward_skb.

Signed-off-by: Ben Greear
Signed-off-by: David S. Miller

Ben Greear
2010-07-26 12:58:46 +0800

25 Jul, 2010

2 commits

fed66381d net: pskb_expand_head() optimization ... Browse Code »

Move frags[] at the end of struct skb_shared_info, and make
pskb_expand_head() copy only the used part of it instead of whole array.

This should avoid kmemcheck warnings and speedup pskb_expand_head() as
well, avoiding a lot of cache misses.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-25 12:05:57 +0800
c1f79426e sysfs: add attribute to indicate hw address assignment type ... Browse Code »

Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.

For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.

This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.

Signed-off-by: Stefan Assmann
Signed-off-by: David S. Miller

Stefan Assmann
2010-07-25 11:49:29 +0800

24 Jul, 2010

1 commit

451e07a26 net: core: don't use own hex_to_bin() method ... Browse Code »

Signed-off-by: Andy Shevchenko
Signed-off-by: David S. Miller

Andy Shevchenko
2010-07-24 03:50:51 +0800

23 Jul, 2010

2 commits

be2b6e623 net: Fix skb_copy_expand() handling of ->csum_start ... Browse Code »

It should only be adjusted if ip_summed == CHECKSUM_PARTIAL.

Signed-off-by: David S. Miller

David S. Miller
2010-07-23 04:27:09 +0800
00c5a9834 net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c ... Browse Code »

Make pskb_expand_head() check ip_summed to make sure csum_start is really
csum_start and not csum before adjusting it.

This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs.
On my configuration, the sunhme driver produces skbs with differing amounts
of headroom on receive depending on the packet size. See line 2030 of
drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes
of headroom but packets larger than that cutoff have only 20 bytes.

When these packets reach the VLAN driver, vlan_check_reorder_header()
calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes
of headroom, uses pskb_expand_head() to make more.

Then, pskb_expand_head() needs to adjust a lot of offsets into the skb,
including csum_start. Since csum_start is a union with csum, if the packet
has a valid csum value this will corrupt it, which was the effect I observed.
The sunhme hardware computes receive checksums, so the skbs would be created
by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and
then pskb_expand_head() would corrupt the csum field, leading to an "hw csum
error" message later on, for example in icmp_rcv() for pings larger than the
sunhme RX_COPY_THRESHOLD.

On the basis of the comment at the beginning of include/linux/skbuff.h,
I believe that the csum_start skb field is only meaningful if ip_csummed is
CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that
case to avoid corrupting a valid csum value.

Please see my more in-depth disucssion of tracking down this bug for
more details if you like:

http://puellavulnerata.livejournal.com/112186.html
http://puellavulnerata.livejournal.com/112567.html
http://puellavulnerata.livejournal.com/112891.html
http://puellavulnerata.livejournal.com/113096.html
http://puellavulnerata.livejournal.com/113591.html

I am not subscribed to this list, so please CC me on replies.

Signed-off-by: Andrea Shepard
Signed-off-by: David S. Miller

Andrea Shepard
2010-07-23 04:25:18 +0800

21 Jul, 2010

3 commits

11fe88393 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/vhost/net.c
net/bridge/br_device.c

Fix merge conflict in drivers/vhost/net.c with guidance from
Stephen Rothwell.

Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d
since net-next-2.6 has fixes that make bridge netpoll work properly thus
we don't need it disabled.

Signed-off-by: David S. Miller

David S. Miller
2010-07-21 09:25:24 +0800
4b706372f drop_monitor: Add error code to detect duplicate state changes ... Browse Code »

Patch to add -EAGAIN error to dropwatch netlink message handling code.
-EAGAIN will be returned anytime userspace attempts to transition the state of
the drop monitor service to a state that its already in. That allows user space
to detect this condition, so it doesn't wait for a success ACK that will never
arrive. Tested successfully by me

Signed-off-by: Neil Horman
Signed-off-by: David S. Miller

Neil Horman
2010-07-21 04:28:04 +0800
d79d99137 __dst_free(): put EXPORT_SYMBOLS after the fct ... Browse Code »

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2010-07-21 04:28:03 +0800

20 Jul, 2010

2 commits

d6d9ca0fe net: this_cpu_xxx conversions ... Browse Code »

Use modern this_cpu_xxx() api, saving few bytes on x86

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-20 06:12:51 +0800
bd27290a5 net: 64bit stats for netdev_queue ... Browse Code »

Since struct netdev_queue tx_bytes/tx_packets/tx_dropped are already
protected by _xmit_lock, its easy to convert these fields to u64 instead
of unsigned long.
This completes 64bit stats for devices using them (vlan, macvlan, ...)

Strictly, we could avoid the locking in dev_txq_stats_fold() on 64bit
arches, but its slow path and we prefer keep it simple.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-20 00:35:40 +0800

19 Jul, 2010

1 commit

c1f19b51d net: support time stamping in phy devices. ... Browse Code »

This patch adds a new networking option to allow hardware time stamps
from PHY devices. When enabled, likely candidates among incoming and
outgoing network packets are offered to the PHY driver for possible
time stamping. When accepted by the PHY driver, incoming packets are
deferred for later delivery by the driver.

The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl
and callbacks for transmit and receive time stamping. Drivers may
optionally implement these functions.

Signed-off-by: Richard Cochran
Signed-off-by: David S. Miller

Richard Cochran
2010-07-19 10:15:26 +0800

15 Jul, 2010

3 commits

b0f77d0ea net: fix problem in reading sock TX queue ... Browse Code »

Fix problem in reading the tx_queue recorded in a socket. In
dev_pick_tx, the TX queue is read by doing a check with
sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
The problem is that there is not mutual exclusion across these
calls in the socket so it it is possible that the queue in the
sock can be invalidated after sk_tx_queue_recorded is called so
that sk_tx_queue get returns -1, which sets 65535 in queue_index
and thus dev_pick_tx returns 65536 which is a bogus queue and
can cause crash in dev_queue_xmit.

We fix this by only calling sk_tx_queue_get which does the proper
checks. The interface is that sk_tx_queue_get returns the TX queue
if the sock argument is non-NULL and TX queue is recorded, else it
returns -1. sk_tx_queue_recorded is no longer used so it can be
completely removed.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2010-07-15 11:50:29 +0800
91a72a705 net/core: neighbour update Oops ... Browse Code »

When configuring DMVPN (GRE + openNHRP) and a GRE remote
address is configured a kernel Oops is observed. The
obserseved Oops is caused by a NULL header_ops pointer
(neigh->dev->header_ops) in neigh_update_hhs() when

void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
= neigh->dev->header_ops->cache_update;

is executed. The dev associated with the NULL header_ops is
the GRE interface. This patch guards against the
possibility that header_ops is NULL.

This Oops was first observed in kernel version 2.6.26.8.

Signed-off-by: Doug Kehn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Doug Kehn
2010-07-15 09:02:16 +0800
87fd308cf net: skb_tx_hash() fix relative to skb_orphan_try() ... Browse Code »

commit fc6055a5ba31e2 (net: Introduce skb_orphan_try()) added early
orphaning of skbs.

This unfortunately added a performance regression in skb_tx_hash() in
case of stacked devices (bonding, vlans, ...)

Since skb->sk is now NULL, we cannot access sk->sk_hash anymore to
spread tx packets to multiple NIC queues on multiqueue devices.

skb_tx_hash() in this case only uses skb->protocol, same value for all
flows.

skb_orphan_try() can copy sk->sk_hash into skb->rxhash and skb_tx_hash()
can use this saved sk_hash value to compute its internal hash value.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-15 06:33:27 +0800

13 Jul, 2010

3 commits

8a994a718 net/core: Remove unnecessary casts of private_data ... Browse Code »

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2010-07-13 12:13:36 +0800
d361fd599 net: sock_free() optimizations ... Browse Code »

Avoid two extra instructions in sock_free(), to reload
skb->truesize and skb->sk

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 11:21:46 +0800
9e34a5b51 net/core: EXPORT_SYMBOL cleanups ... Browse Code »

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 03:57:55 +0800

10 Jul, 2010

2 commits

d77535162 net: Document that dev_get_stats() returns the given pointer ... Browse Code »

Document that dev_get_stats() returns the same stats pointer it was
given. Remove const qualification from the returned pointer since the
caller may do what it likes with that structure.

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2010-07-10 08:41:57 +0800
3cfde79c6 net: Get rid of rtnl_link_stats64 / net_device_stats union ... Browse Code »

In commit be1f3c2c027cc5ad735df6a45a542ed1db7ec48b "net: Enable 64-bit
net device statistics on 32-bit architectures" I redefined struct
net_device_stats so that it could be used in a union with struct
rtnl_link_stats64, avoiding the need for explicit copying or
conversion between the two. However, this is unsafe because there is
no locking required and no lock consistently held around calls to
dev_get_stats() and use of the statistics structure it returns.

In commit 28172739f0a276eb8d6ca917b3974c2edb036da3 "net: fix 64 bit
counters on 32 bit arches" Eric Dumazet dealt with that problem by
requiring callers of dev_get_stats() to provide storage for the
result. This means that the net_device::stats64 field and the padding
in struct net_device_stats are now redundant, so remove them.

Update the comment on net_device_ops::ndo_get_stats64 to reflect its
new usage.

Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
that is what all its callers are really using and it is no longer
going to be compatible with struct net_device_stats.

Eric Dumazet suggested the separate function for the structure
conversion.

Signed-off-by: Ben Hutchings
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Ben Hutchings
2010-07-10 08:41:56 +0800

08 Jul, 2010

2 commits

597e608a8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Browse Code »

David S. Miller
2010-07-08 06:59:38 +0800
28172739f net: fix 64 bit counters on 32 bit arches ... Browse Code »

There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.

One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().

Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)

Old drivers (only providing get_stats() method) need no changes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-08 05:58:56 +0800

05 Jul, 2010

1 commit

256df2f38 netdevice.h net/core/dev.c: Convert netdev_<level> logging macros to functions ... Browse Code »

Reduces an x86 defconfig text and data ~2k.
text is smaller, data is larger.

$ size vmlinux*
text data bss dec hex filename
7198862 720112 1366288 9285262 8dae8e vmlinux
7205273 716016 1366288 9287577 8db799 vmlinux.device_h

Uses %pV and struct va_format
Format arguments are verified before printk

Signed-off-by: Joe Perches
Acked-by: Greg Kroah-Hartman
Signed-off-by: David S. Miller

Joe Perches
2010-07-05 01:40:18 +0800

03 Jul, 2010

1 commit

f0796d5c7 net: decreasing real_num_tx_queues needs to flush qdisc ... Browse Code »

Reducing real_num_queues needs to flush the qdisc otherwise
skbs with queue_mappings greater then real_num_tx_queues can
be sent to the underlying driver.

The flow for this is,

dev_queue_xmit()
dev_pick_tx()
skb_tx_hash() => hash using real_num_tx_queues
skb_set_queue_mapping()
...
qdisc_enqueue_root() => enqueue skb on txq from hash
...
dev->real_num_tx_queues -= n
...
sch_direct_xmit()
dev_hard_start_xmit()
ndo_start_xmit(skb,dev) => skb queue set with old hash

skbs are enqueued on the qdisc with skb->queue_mapping set
0 < queue_mappings < real_num_tx_queues. When the driver
decreases real_num_tx_queues skb's may be dequeued from the
qdisc with a queue_mapping greater then real_num_tx_queues.

This fixes a case in ixgbe where this was occurring with DCB
and FCoE. Because the driver is using queue_mapping to map
skbs to tx descriptor rings we can potentially map skbs to
rings that no longer exist.

Signed-off-by: John Fastabend
Tested-by: Ross Brattain
Signed-off-by: Jeff Kirsher
Signed-off-by: David S. Miller

John Fastabend
2010-07-03 12:59:07 +0800

01 Jul, 2010

3 commits

a5b6ee291 ethtool: Add support for control of RX flow hash indirection ... Browse Code »

Many NICs use an indirection table to map an RX flow hash value to one
of an arbitrary number of queues (not necessarily a power of 2). It
can be useful to remove some queues from this indirection table so
that they are only used for flows that are specifically filtered
there. It may also be useful to weight the mapping to account for
user processes with the same CPU-affinity as the RX interrupts.

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2010-07-01 05:09:37 +0800
1437ce398 ethtool: Change ethtool_op_set_flags to validate flags ... Browse Code »

ethtool_op_set_flags() does not check for unsupported flags, and has
no way of doing so. This means it is not suitable for use as a
default implementation of ethtool_ops::set_flags.

Add a 'supported' parameter specifying the flags that the driver and
hardware support, validate the requested flags against this, and
change all current callers to pass this parameter.

Change some other trivial implementations of ethtool_ops::set_flags to
call ethtool_op_set_flags().

Signed-off-by: Ben Hutchings
Reviewed-by: Stanislaw Gruszka
Acked-by: Jeff Garzik
Signed-off-by: David S. Miller

Ben Hutchings
2010-07-01 05:09:35 +0800
70777d034 net/core: use ntohs for skb->protocol ... Browse Code »

This is only noticed by people that are not doing everything correct in
the first place.

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: David S. Miller

Sebastian Andrzej Siewior
2010-07-01 01:39:19 +0800

29 Jun, 2010

3 commits

bf988435b ethtool: Fix potential user buffer overflow for ETHTOOL_{G, S}RXFH ... Browse Code »

struct ethtool_rxnfc was originally defined in 2.6.27 for the
ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
fields. It was then extended in 2.6.30 to support various additional
commands. These commands should have been defined to use a new
structure, but it is too late to change that now.

Since user-space may still be using the old structure definition
for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
additional fields, only copy the originally defined fields to and
from user-space.

Signed-off-by: Ben Hutchings
Cc: stable@kernel.org
Signed-off-by: David S. Miller

Ben Hutchings
2010-06-29 16:00:29 +0800
db048b690 ethtool: Fix potential kernel buffer overflow in ETHTOOL_GRXCLSRLALL ... Browse Code »

On a 32-bit machine, info.rule_cnt >= 0x40000000 leads to integer
overflow and the buffer may be smaller than needed. Since
ETHTOOL_GRXCLSRLALL is unprivileged, this can presumably be used for at
least denial of service.

Signed-off-by: Ben Hutchings
Cc: stable@kernel.org
Signed-off-by: David S. Miller

Ben Hutchings
2010-06-29 16:00:29 +0800
7a9b2d595 net: use this_cpu_ptr() ... Browse Code »

use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id())

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-06-29 14:24:29 +0800

26 Jun, 2010

2 commits

f9467eaec net/core/pktgen.c: Use pr_<level> ... Browse Code »

Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove "pktgen: " from formats
Convert printks to pr_
Added func_enter() for debugging
Moved version to end of string at module_init
Coalesced long formats

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2010-06-26 12:33:12 +0800
01f2f3f6e net: optimize Berkeley Packet Filter (BPF) processing ... Browse Code »

Gcc is currenlty not in the ability to optimize the switch statement in
sk_run_filter() because of dense case labels. This patch replace the
OR'd labels with ordered sequenced case labels. The sk_chk_filter()
function is modified to patch/replace the original OPCODES in a
ordered but equivalent form. gcc is now in the ability to transform the
switch statement in sk_run_filter into a jump table of complexity O(1).

Until this patch gcc generates a sequence of conditional branches (O(n) of 567
byte .text segment size (arch x86_64):

7ff: 8b 06 mov (%rsi),%eax
801: 66 83 f8 35 cmp $0x35,%ax
805: 0f 84 d0 02 00 00 je adb
80b: 0f 87 07 01 00 00 ja 918
811: 66 83 f8 15 cmp $0x15,%ax
815: 0f 84 c5 02 00 00 je ae0
81b: 77 73 ja 890
81d: 66 83 f8 04 cmp $0x4,%ax
821: 0f 84 17 02 00 00 je a3e
827: 77 29 ja 852
829: 66 83 f8 01 cmp $0x1,%ax
[...]

With the modification the compiler translate the switch statement into
the following jump table fragment:

7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
803: 0f 87 1f 02 00 00 ja a28
809: 0f b7 06 movzwl (%rsi),%eax
80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
813: 44 89 e3 mov %r12d,%ebx
816: e9 43 03 00 00 jmpq b5e
81b: 41 89 dc mov %ebx,%r12d
81e: e9 3b 03 00 00 jmpq b5e

Furthermore, I reordered the instructions to reduce cache line misses by
order the most common instruction to the start.

Signed-off-by: Hagen Paul Pfeifer
Signed-off-by: David S. Miller

Hagen Paul Pfeifer
2010-06-26 12:33:12 +0800

25 Jun, 2010

1 commit

deb0d7c74 net: fix "netpoll: Allow netpoll_setup/cleanup recursion" ... Browse Code »

Remove rtnl_unlock() which had no corresponding rtnl_lock().

Signed-off-by: Andrew Morton
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Andrew Morton
2010-06-25 11:33:04 +0800

24 Jun, 2010

1 commit

6afff0caa net: consolidate netif_needs_gso() checks ... Browse Code »

netif_needs_gso() is checked twice in the TX path once,
before submitting the skb to the qdisc and once after
it is dequeued from the qdisc just before calling
ndo_hard_start(). This opens a window for a user to
change the gso/tso or tx checksum settings that can
cause netif_needs_gso to be true in one check and false
in the other.

Specifically, changing TX checksum setting may cause
the warning in skb_gso_segment() to be triggered if
the checksum is calculated earlier.

This consolidates the netif_needs_gso() calls so that
the stack only checks if gso is needed in
dev_hard_start_xmit().

Signed-off-by: John Fastabend
Cc: Herbert Xu
Signed-off-by: Jeff Kirsher
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

John Fastabend
2010-06-24 03:58:41 +0800

17 Jun, 2010

1 commit

3924773a5 net: Export cred_to_ucred to modules. ... Browse Code »

AF_UNIX references this, and can be built as a module,
so...

Signed-off-by: David S. Miller

David S. Miller
2010-06-17 07:18:25 +0800