03 Aug, 2010
2 commits
-
This reverts commit 15e83ed78864d0625e87a85f09b297c0919a4797.
As explained by Johannes Berg, the optimization made here is
invalid. Or, at best, incomplete.Not only destructor invocation, but conntract entry releasing
must be executed outside of hw IRQ context.So just checking "skb->destructor" is insufficient.
Signed-off-by: David S. Miller
-
Commit ab95bfe01f9872459c8678572ccadbf646badad0 replaces bridge and macvlan
hooks in __netif_receive_skb(), so dev.c doesn't need to include their headers.Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
01 Aug, 2010
1 commit
-
If user misconfigures ingress and causes a redirection loop, don't
overwhelm the log. This is also a error case so make it unlikely.
Found by inspection, luckily not in real system.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
28 Jul, 2010
1 commit
-
Conflicts:
drivers/net/bnx2x_main.cMerge bnx2x bug fixes in by hand... :-/
Signed-off-by: David S. Miller
27 Jul, 2010
1 commit
-
[ Fix unused local variable build warnings. -DaveM ]
Signed-off-by: Changli Gao
Acked-by: Neil Horman
Signed-off-by: David S. Miller
26 Jul, 2010
1 commit
-
With conn-track zones and probably with different network
namespaces, the netfilter logic needs to be re-calculated
on packet receive. If the netfilter logic is not reset,
it will not be recalculated properly. This patch adds
the nf_reset logic to dev_forward_skb.Signed-off-by: Ben Greear
Signed-off-by: David S. Miller
25 Jul, 2010
2 commits
-
Move frags[] at the end of struct skb_shared_info, and make
pskb_expand_head() copy only the used part of it instead of whole array.This should avoid kmemcheck warnings and speedup pskb_expand_head() as
well, avoiding a lot of cache misses.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.Signed-off-by: Stefan Assmann
Signed-off-by: David S. Miller
24 Jul, 2010
1 commit
-
Signed-off-by: Andy Shevchenko
Signed-off-by: David S. Miller
23 Jul, 2010
2 commits
-
It should only be adjusted if ip_summed == CHECKSUM_PARTIAL.
Signed-off-by: David S. Miller
-
Make pskb_expand_head() check ip_summed to make sure csum_start is really
csum_start and not csum before adjusting it.This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs.
On my configuration, the sunhme driver produces skbs with differing amounts
of headroom on receive depending on the packet size. See line 2030 of
drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes
of headroom but packets larger than that cutoff have only 20 bytes.When these packets reach the VLAN driver, vlan_check_reorder_header()
calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes
of headroom, uses pskb_expand_head() to make more.Then, pskb_expand_head() needs to adjust a lot of offsets into the skb,
including csum_start. Since csum_start is a union with csum, if the packet
has a valid csum value this will corrupt it, which was the effect I observed.
The sunhme hardware computes receive checksums, so the skbs would be created
by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and
then pskb_expand_head() would corrupt the csum field, leading to an "hw csum
error" message later on, for example in icmp_rcv() for pings larger than the
sunhme RX_COPY_THRESHOLD.On the basis of the comment at the beginning of include/linux/skbuff.h,
I believe that the csum_start skb field is only meaningful if ip_csummed is
CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that
case to avoid corrupting a valid csum value.Please see my more in-depth disucssion of tracking down this bug for
more details if you like:http://puellavulnerata.livejournal.com/112186.html
http://puellavulnerata.livejournal.com/112567.html
http://puellavulnerata.livejournal.com/112891.html
http://puellavulnerata.livejournal.com/113096.html
http://puellavulnerata.livejournal.com/113591.htmlI am not subscribed to this list, so please CC me on replies.
Signed-off-by: Andrea Shepard
Signed-off-by: David S. Miller
21 Jul, 2010
3 commits
-
Conflicts:
drivers/vhost/net.c
net/bridge/br_device.cFix merge conflict in drivers/vhost/net.c with guidance from
Stephen Rothwell.Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d
since net-next-2.6 has fixes that make bridge netpoll work properly thus
we don't need it disabled.Signed-off-by: David S. Miller
-
Patch to add -EAGAIN error to dropwatch netlink message handling code.
-EAGAIN will be returned anytime userspace attempts to transition the state of
the drop monitor service to a state that its already in. That allows user space
to detect this condition, so it doesn't wait for a success ACK that will never
arrive. Tested successfully by meSigned-off-by: Neil Horman
Signed-off-by: David S. Miller -
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
20 Jul, 2010
2 commits
-
Use modern this_cpu_xxx() api, saving few bytes on x86
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Since struct netdev_queue tx_bytes/tx_packets/tx_dropped are already
protected by _xmit_lock, its easy to convert these fields to u64 instead
of unsigned long.
This completes 64bit stats for devices using them (vlan, macvlan, ...)Strictly, we could avoid the locking in dev_txq_stats_fold() on 64bit
arches, but its slow path and we prefer keep it simple.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Jul, 2010
1 commit
-
This patch adds a new networking option to allow hardware time stamps
from PHY devices. When enabled, likely candidates among incoming and
outgoing network packets are offered to the PHY driver for possible
time stamping. When accepted by the PHY driver, incoming packets are
deferred for later delivery by the driver.The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl
and callbacks for transmit and receive time stamping. Drivers may
optionally implement these functions.Signed-off-by: Richard Cochran
Signed-off-by: David S. Miller
15 Jul, 2010
3 commits
-
Fix problem in reading the tx_queue recorded in a socket. In
dev_pick_tx, the TX queue is read by doing a check with
sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
The problem is that there is not mutual exclusion across these
calls in the socket so it it is possible that the queue in the
sock can be invalidated after sk_tx_queue_recorded is called so
that sk_tx_queue get returns -1, which sets 65535 in queue_index
and thus dev_pick_tx returns 65536 which is a bogus queue and
can cause crash in dev_queue_xmit.We fix this by only calling sk_tx_queue_get which does the proper
checks. The interface is that sk_tx_queue_get returns the TX queue
if the sock argument is non-NULL and TX queue is recorded, else it
returns -1. sk_tx_queue_recorded is no longer used so it can be
completely removed.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
When configuring DMVPN (GRE + openNHRP) and a GRE remote
address is configured a kernel Oops is observed. The
obserseved Oops is caused by a NULL header_ops pointer
(neigh->dev->header_ops) in neigh_update_hhs() whenvoid (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
= neigh->dev->header_ops->cache_update;is executed. The dev associated with the NULL header_ops is
the GRE interface. This patch guards against the
possibility that header_ops is NULL.This Oops was first observed in kernel version 2.6.26.8.
Signed-off-by: Doug Kehn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
commit fc6055a5ba31e2 (net: Introduce skb_orphan_try()) added early
orphaning of skbs.This unfortunately added a performance regression in skb_tx_hash() in
case of stacked devices (bonding, vlans, ...)Since skb->sk is now NULL, we cannot access sk->sk_hash anymore to
spread tx packets to multiple NIC queues on multiqueue devices.skb_tx_hash() in this case only uses skb->protocol, same value for all
flows.skb_orphan_try() can copy sk->sk_hash into skb->rxhash and skb_tx_hash()
can use this saved sk_hash value to compute its internal hash value.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Jul, 2010
3 commits
-
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller -
Avoid two extra instructions in sock_free(), to reload
skb->truesize and skb->skSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
CodingStyle cleanups
EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Jul, 2010
2 commits
-
Document that dev_get_stats() returns the same stats pointer it was
given. Remove const qualification from the returned pointer since the
caller may do what it likes with that structure.Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller -
In commit be1f3c2c027cc5ad735df6a45a542ed1db7ec48b "net: Enable 64-bit
net device statistics on 32-bit architectures" I redefined struct
net_device_stats so that it could be used in a union with struct
rtnl_link_stats64, avoiding the need for explicit copying or
conversion between the two. However, this is unsafe because there is
no locking required and no lock consistently held around calls to
dev_get_stats() and use of the statistics structure it returns.In commit 28172739f0a276eb8d6ca917b3974c2edb036da3 "net: fix 64 bit
counters on 32 bit arches" Eric Dumazet dealt with that problem by
requiring callers of dev_get_stats() to provide storage for the
result. This means that the net_device::stats64 field and the padding
in struct net_device_stats are now redundant, so remove them.Update the comment on net_device_ops::ndo_get_stats64 to reflect its
new usage.Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
that is what all its callers are really using and it is no longer
going to be compatible with struct net_device_stats.Eric Dumazet suggested the separate function for the structure
conversion.Signed-off-by: Ben Hutchings
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
08 Jul, 2010
2 commits
-
There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)Old drivers (only providing get_stats() method) need no changes.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
05 Jul, 2010
1 commit
-
Reduces an x86 defconfig text and data ~2k.
text is smaller, data is larger.$ size vmlinux*
text data bss dec hex filename
7198862 720112 1366288 9285262 8dae8e vmlinux
7205273 716016 1366288 9287577 8db799 vmlinux.device_hUses %pV and struct va_format
Format arguments are verified before printkSigned-off-by: Joe Perches
Acked-by: Greg Kroah-Hartman
Signed-off-by: David S. Miller
03 Jul, 2010
1 commit
-
Reducing real_num_queues needs to flush the qdisc otherwise
skbs with queue_mappings greater then real_num_tx_queues can
be sent to the underlying driver.The flow for this is,
dev_queue_xmit()
dev_pick_tx()
skb_tx_hash() => hash using real_num_tx_queues
skb_set_queue_mapping()
...
qdisc_enqueue_root() => enqueue skb on txq from hash
...
dev->real_num_tx_queues -= n
...
sch_direct_xmit()
dev_hard_start_xmit()
ndo_start_xmit(skb,dev) => skb queue set with old hashskbs are enqueued on the qdisc with skb->queue_mapping set
0 < queue_mappings < real_num_tx_queues. When the driver
decreases real_num_tx_queues skb's may be dequeued from the
qdisc with a queue_mapping greater then real_num_tx_queues.This fixes a case in ixgbe where this was occurring with DCB
and FCoE. Because the driver is using queue_mapping to map
skbs to tx descriptor rings we can potentially map skbs to
rings that no longer exist.Signed-off-by: John Fastabend
Tested-by: Ross Brattain
Signed-off-by: Jeff Kirsher
Signed-off-by: David S. Miller
01 Jul, 2010
3 commits
-
Many NICs use an indirection table to map an RX flow hash value to one
of an arbitrary number of queues (not necessarily a power of 2). It
can be useful to remove some queues from this indirection table so
that they are only used for flows that are specifically filtered
there. It may also be useful to weight the mapping to account for
user processes with the same CPU-affinity as the RX interrupts.Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller -
ethtool_op_set_flags() does not check for unsupported flags, and has
no way of doing so. This means it is not suitable for use as a
default implementation of ethtool_ops::set_flags.Add a 'supported' parameter specifying the flags that the driver and
hardware support, validate the requested flags against this, and
change all current callers to pass this parameter.Change some other trivial implementations of ethtool_ops::set_flags to
call ethtool_op_set_flags().Signed-off-by: Ben Hutchings
Reviewed-by: Stanislaw Gruszka
Acked-by: Jeff Garzik
Signed-off-by: David S. Miller -
This is only noticed by people that are not doing everything correct in
the first place.Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: David S. Miller
29 Jun, 2010
3 commits
-
struct ethtool_rxnfc was originally defined in 2.6.27 for the
ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
fields. It was then extended in 2.6.30 to support various additional
commands. These commands should have been defined to use a new
structure, but it is too late to change that now.Since user-space may still be using the old structure definition
for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
additional fields, only copy the originally defined fields to and
from user-space.Signed-off-by: Ben Hutchings
Cc: stable@kernel.org
Signed-off-by: David S. Miller -
On a 32-bit machine, info.rule_cnt >= 0x40000000 leads to integer
overflow and the buffer may be smaller than needed. Since
ETHTOOL_GRXCLSRLALL is unprivileged, this can presumably be used for at
least denial of service.Signed-off-by: Ben Hutchings
Cc: stable@kernel.org
Signed-off-by: David S. Miller -
use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id())
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
26 Jun, 2010
2 commits
-
Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove "pktgen: " from formats
Convert printks to pr_
Added func_enter() for debugging
Moved version to end of string at module_init
Coalesced long formatsSigned-off-by: Joe Perches
Signed-off-by: David S. Miller -
Gcc is currenlty not in the ability to optimize the switch statement in
sk_run_filter() because of dense case labels. This patch replace the
OR'd labels with ordered sequenced case labels. The sk_chk_filter()
function is modified to patch/replace the original OPCODES in a
ordered but equivalent form. gcc is now in the ability to transform the
switch statement in sk_run_filter into a jump table of complexity O(1).Until this patch gcc generates a sequence of conditional branches (O(n) of 567
byte .text segment size (arch x86_64):7ff: 8b 06 mov (%rsi),%eax
801: 66 83 f8 35 cmp $0x35,%ax
805: 0f 84 d0 02 00 00 je adb
80b: 0f 87 07 01 00 00 ja 918
811: 66 83 f8 15 cmp $0x15,%ax
815: 0f 84 c5 02 00 00 je ae0
81b: 77 73 ja 890
81d: 66 83 f8 04 cmp $0x4,%ax
821: 0f 84 17 02 00 00 je a3e
827: 77 29 ja 852
829: 66 83 f8 01 cmp $0x1,%ax
[...]With the modification the compiler translate the switch statement into
the following jump table fragment:7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
803: 0f 87 1f 02 00 00 ja a28
809: 0f b7 06 movzwl (%rsi),%eax
80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
813: 44 89 e3 mov %r12d,%ebx
816: e9 43 03 00 00 jmpq b5e
81b: 41 89 dc mov %ebx,%r12d
81e: e9 3b 03 00 00 jmpq b5eFurthermore, I reordered the instructions to reduce cache line misses by
order the most common instruction to the start.Signed-off-by: Hagen Paul Pfeifer
Signed-off-by: David S. Miller
25 Jun, 2010
1 commit
-
Remove rtnl_unlock() which had no corresponding rtnl_lock().
Signed-off-by: Andrew Morton
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
24 Jun, 2010
1 commit
-
netif_needs_gso() is checked twice in the TX path once,
before submitting the skb to the qdisc and once after
it is dequeued from the qdisc just before calling
ndo_hard_start(). This opens a window for a user to
change the gso/tso or tx checksum settings that can
cause netif_needs_gso to be true in one check and false
in the other.Specifically, changing TX checksum setting may cause
the warning in skb_gso_segment() to be triggered if
the checksum is calculated earlier.This consolidates the netif_needs_gso() calls so that
the stack only checks if gso is needed in
dev_hard_start_xmit().Signed-off-by: John Fastabend
Cc: Herbert Xu
Signed-off-by: Jeff Kirsher
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
17 Jun, 2010
1 commit
-
AF_UNIX references this, and can be built as a module,
so...Signed-off-by: David S. Miller