Eric Lee / smarc-fsl-linux-kernel

21 May, 2011

1 commit

06f4e926d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
macvlan: fix panic if lowerdev in a bond
tg3: Add braces around 5906 workaround.
tg3: Fix NETIF_F_LOOPBACK error
macvlan: remove one synchronize_rcu() call
networking: NET_CLS_ROUTE4 depends on INET
irda: Fix error propagation in ircomm_lmp_connect_response()
irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
be2net: Kill set but unused variable 'req' in lancer_fw_download()
irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
isdn: capi: Use pr_debug() instead of ifdefs.
tg3: Update version to 3.119
tg3: Apply rx_discards fix to 5719/5720
...

Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
as per Davem.

Linus Torvalds
2011-05-21 04:43:21 +0800

17 May, 2011

1 commit

2142c131a net: convert to new cpumask API ... Browse Code »

We plan to remove cpu_xx() old api later. Thus this patch
convert it.

This patch has no functional change.

Signed-off-by: KOSAKI Motohiro
Signed-off-by: David S. Miller

KOSAKI Motohiro
2011-05-17 02:53:49 +0800

08 May, 2011

3 commits

b55071eb6 net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu() ... Browse Code »

The rcu callback xps_dev_maps_release() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(xps_dev_maps_release).

Signed-off-by: Lai Jiangshan
Acked-by: David S. Miller
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Lai Jiangshan
2011-05-08 13:51:04 +0800
edc86d8a1 net,rcu: convert call_rcu(xps_map_release) to kfree_rcu() ... Browse Code »

The rcu callback xps_map_release() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(xps_map_release).

Signed-off-by: Lai Jiangshan
Acked-by: David S. Miller
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Lai Jiangshan
2011-05-08 13:51:03 +0800
f6f80238f net,rcu: convert call_rcu(rps_map_release) to kfree_rcu() ... Browse Code »

The rcu callback rps_map_release() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(rps_map_release).

Signed-off-by: Lai Jiangshan
Acked-by: David S. Miller
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Lai Jiangshan
2011-05-08 13:51:02 +0800

30 Apr, 2011

1 commit

8ae6daca8 ethtool: Call ethtool's get/set_settings callbacks with cleaned data ... Browse Code »

This makes sure that when a driver calls the ethtool's
get/set_settings() callback of another driver, the data passed to it
is clean. This guarantees that speed_hi will be zeroed correctly if
the called callback doesn't explicitely set it: we are sure we don't
get a corrupted speed from the underlying driver. We also take care of
setting the cmd field appropriately (ETHTOOL_GSET/SSET).

This applies to dev_ethtool_get_settings(), which now makes sure it
sets up that ethtool command parameter correctly before passing it to
drivers. This also means that whoever calls dev_ethtool_get_settings()
does not have to clean the ethtool command parameter. This function
also becomes an exported symbol instead of an inline.

All drivers visible to make allyesconfig under x86_64 have been
updated.

Signed-off-by: David Decotigny
Signed-off-by: David S. Miller

David Decotigny
2011-04-30 05:01:30 +0800

10 Feb, 2011

1 commit

b6644cb70 net: rename group sysfs entry to netdev_group ... Browse Code »

commit a512b92 adds sysfs entry for net device group, but
before this commit, tun also uses group sysfs, so after this
commit checkin, kernel warns like this:
sysfs: cannot create duplicate filename '/devices/virtual/net/vnet0/group'

Since tun has used this for years, rename sysfs under tun might
break existing userspace, so rename group sysfs entry for net device
group is a better choice.

Signed-off-by: Xiaotian Feng
Signed-off-by: David S. Miller

Xiaotian Feng
2011-02-10 11:16:15 +0800

25 Jan, 2011

2 commits

a512b92b3 net: add sysfs entry for device group ... Browse Code »

The group of a network device can be queried or changed from userspace
using sysfs.

For example, considering sysfs mounted in /sys, one can change the group
that interface lo belongs to:
echo 1 > /sys/class/net/lo/group

Signed-off-by: Vlad Dogaru
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Vlad Dogaru
2011-01-25 15:23:28 +0800
04ed3e741 net: change netdev->features to u32 ... Browse Code »

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2011-01-25 07:32:47 +0800

17 Dec, 2010

1 commit

b236da693 net: use NUMA_NO_NODE instead of the magic number -1 ... Browse Code »

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-12-17 05:16:06 +0800

02 Dec, 2010

1 commit

f2cd2d3e9 net sched: use xps information for qdisc NUMA affinity ... Browse Code »

Allocate qdisc memory according to NUMA properties of cpus included in
xps map.

To be effective, qdisc should be (re)setup after changes
of /sys/class/net/eth/queues/tx-/xps_cpus

I added a numa_node field in struct netdev_queue, containing NUMA node
if all cpus included in xps_cpus share same node, else -1.

Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2010-12-02 04:47:42 +0800

30 Nov, 2010

2 commits

a41778694 xps: add __rcu annotations ... Browse Code »

Avoid sparse warnings : add __rcu annotations and use
rcu_dereference_protected() where necessary.

Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2010-11-30 01:43:13 +0800
b02038a17 xps: NUMA allocations for per cpu data ... Browse Code »

store_xps_map() allocates maps that are used by single cpu, it makes
sense to use NUMA allocations.

Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2010-11-30 01:43:13 +0800

29 Nov, 2010

1 commit

bf2641451 xps: Add CONFIG_XPS ... Browse Code »

This patch adds XPS_CONFIG option to enable and disable XPS. This is
done in the same manner as RPS_CONFIG. This is also fixes build
failure in XPS code when SMP is not enabled.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2010-11-29 10:24:14 +0800

25 Nov, 2010

1 commit

1d24eb481 xps: Transmit Packet Steering ... Browse Code »

This patch implements transmit packet steering (XPS) for multiqueue
devices. XPS selects a transmit queue during packet transmission based
on configuration. This is done by mapping the CPU transmitting the
packet to a queue. This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).

Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets. This is configured as a CPU mask on a
per queue basis in:

/sys/class/net/eth/queues/tx-/xps_cpus

The mappings are stored per device in an inverted data structure that
maps CPUs to queues. In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.

The benefits of XPS are improved locality in the per queue data
structures. Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP). The benefits of XPS are dependent on
cache hierarchy, application load, and other factors. XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.

Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.

bnx2x on 16 core AMD
XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU
No XPS (16 queues) 996K at 100% CPU

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2010-11-25 03:44:20 +0800

18 Nov, 2010

1 commit

9ea19481d net: zero kobject in rx_queue_release ... Browse Code »

netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
[] kobject_init+0x3a/0x83
[] kobject_init_and_add+0x23/0x57
[] ? mark_lock+0x21/0x267
[] net_rx_queue_update_kobjects+0x63/0xc6
[] netif_set_real_num_rx_queues+0x5f/0x78
[] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
[] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
[] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2010-11-18 04:27:46 +0800

16 Nov, 2010

1 commit

fe8222406 net: Simplify RX queue allocation ... Browse Code »

This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation). Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2010-11-16 02:57:28 +0800

26 Oct, 2010

1 commit

6e3f7faf3 rps: add __rcu annotations ... Browse Code »

Add __rcu annotations to :
(struct netdev_rx_queue)->rps_map
(struct netdev_rx_queue)->rps_flow_table
struct rps_sock_flow_table *rps_sock_flow_table;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-10-26 05:18:27 +0800

09 Oct, 2010

1 commit

4315d834c net: Fix rxq ref counting ... Browse Code »

The rx->count reference is used to track reference counts to the
number of rx-queue kobjects created for the device. This patch
eliminates initialization of the counter in netif_alloc_rx_queues
and instead increments the counter each time a kobject is created.
This is now symmetric with the decrement that is done when an object is
released.

Signed-off-by: Tom Herbert
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Tom Herbert
2010-10-09 05:34:32 +0800

28 Sep, 2010

1 commit

62fe0b40a net: Allow changing number of RX queues after device allocation ... Browse Code »

For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq(). However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.

For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues(). Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2010-09-28 13:09:49 +0800

02 Sep, 2010

1 commit

fa50d6457 net: make rx_queue sysfs_ops const ... Browse Code »

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
2010-09-02 09:12:20 +0800

17 Aug, 2010

1 commit

046007949 cfg80211: support sysfs namespaces ... Browse Code »

Enable using network namespaces with
wireless devices even when sysfs is
enabled using the same infrastructure
that was built for netdevs.

Signed-off-by: Johannes Berg
Acked-by: "Eric W. Biederman"
Signed-off-by: John W. Linville

Johannes Berg
2010-08-17 03:26:40 +0800

25 Jul, 2010

1 commit

c1f79426e sysfs: add attribute to indicate hw address assignment type ... Browse Code »

Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.

For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.

This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.

Signed-off-by: Stefan Assmann
Signed-off-by: David S. Miller

Stefan Assmann
2010-07-25 11:49:29 +0800

13 Jul, 2010

1 commit

9e34a5b51 net/core: EXPORT_SYMBOL cleanups ... Browse Code »

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 03:57:55 +0800

08 Jul, 2010

1 commit

28172739f net: fix 64 bit counters on 32 bit arches ... Browse Code »

There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.

One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().

Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)

Old drivers (only providing get_stats() method) need no changes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-08 05:58:56 +0800

13 Jun, 2010

1 commit

be1f3c2c0 net: Enable 64-bit net device statistics on 32-bit architectures ... Browse Code »

Use struct rtnl_link_stats64 as the statistics structure.

On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64. Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.

Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64. Drivers that implement
this operation must not update the structure asynchronously.

Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64. Change callers of
dev_get_stats() accordingly.

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2010-06-13 06:51:22 +0800

22 May, 2010

3 commits

a1b3f594d net: Expose all network devices in a namespaces in sysfs ... Browse Code »

This reverts commit aaf8cdc34ddba08122f02217d9d684e2f9f5d575.

Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.

In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.

Ouch!

Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents. Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.

Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.

So remove the misguided fix for moving devices between namespaces. The
bug in the kobject layer that inspired it has now been recognized and
fixed.

Signed-off-by: Eric W. Biederman
Acked-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2010-05-22 00:37:34 +0800
d6523ddf2 net/sysfs: Fix the bitrot in network device kobject namespace support ... Browse Code »

I had a couple of stupid bugs in:
netns: Teach network device kobjects which namespace they are in.

- I duplicated the Kconfig for the NET_NS
- The build was broken when sysfs was not compiled in

The sysfs breakage is because after I moved the operations
for the sysfs to the kobject layer, to make things cleaner
I forgot to move the ifdefs. Opps.

I'm not quite certain how I got introduced a second NET_NS Kconfig,
but it was probably a 3 way merge somewhere along the way that
did not notice that the NET_NS Kconfig option had mvoed and thout
that was a bug. It probably slipped in because it used to be the
sysfs patches were the first patches in my network namespace patches.
Some things just don't go like you would expect.

Neither of these bugs actually affect anything in the common case
but they should be fixed.

Thanks to Serge for noticing they were present.

Reported-by: Serge E. Hallyn
Signed-off-by: Eric W. Biederman
Acked-by: David S. Miller

Eric W. Biederman
2010-05-22 00:37:32 +0800
608b4b954 netns: Teach network device kobjects which namespace they are in. ... Browse Code »

The problem. Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!

To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.

By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.

Signed-off-by: Eric W. Biederman
Acked-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2010-05-22 00:37:32 +0800

20 Apr, 2010

1 commit

f5acb907d rps: static functions ... Browse Code »

store_rps_map() & store_rps_dev_flow_table_cnt() are static.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-20 05:40:57 +0800

17 Apr, 2010

1 commit

fec5e652e rfs: Receive Flow Steering ... Browse Code »

This patch implements receive flow steering (RFS). RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running. RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure. The rxhash is passed in skb's received on
the connection from netif_receive_skb. For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets. If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table. Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue. Each entry
contains a CPU and a tail queue counter. The CPU is the "current"
CPU for a matching flow. The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length. When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted. When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table. This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality. 2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS. The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue. Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors. On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation. However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp. The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
No RFS or RPS 104K tps at 30% CPU
No RFS (best RPS config): 290K tps at 63% CPU
RFS 303K tps at 61% CPU

RPC test tps CPU% 50/90/99% usec latency Latency StdDev
No RFS/RPS 103K 48% 757/900/3185 4472.35
RPS only: 174K 73% 415/993/2468 491.66
RFS 223K 73% 379/651/1382 315.61

Signed-off-by: Tom Herbert
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Tom Herbert
2010-04-17 07:01:27 +0800

12 Apr, 2010

1 commit

871039f02 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/stmmac/stmmac_main.c
drivers/net/wireless/wl12xx/wl1271_cmd.c
drivers/net/wireless/wl12xx/wl1271_main.c
drivers/net/wireless/wl12xx/wl1271_spi.c
net/core/ethtool.c
net/mac80211/scan.c

David S. Miller
2010-04-12 05:53:53 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

29 Mar, 2010

1 commit

30bde1f50 rps: fix net-sysfs build for !CONFIG_RPS ... Browse Code »

Signed-off-by: Stephen Rothwell
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Stephen Rothwell
2010-03-29 16:00:44 +0800

23 Mar, 2010

1 commit

e880eb6c5 rps: Fix build with CONFIG_SYSFS enabled ... Browse Code »

Fix build with CONFIG_SYSFS not enabled.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2010-03-23 09:06:47 +0800

17 Mar, 2010

1 commit

0a9627f26 rps: Receive Packet Steering ... Browse Code »

This patch implements software receive side packet steering (RPS). RPS
distributes the load of received packet processing across multiple CPUs.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load. This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs. This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel. For each device (or each receive queue in
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets. A CPU is selected on a per packet basis by hashing contents
of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
into the CPU mask. The IPI mechanism is used to raise networking receive
softirqs between CPUs. This effectively emulates in software what a multi-queue
NIC can provide, but is generic requiring no device support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering it to a remote CPU.

The CPU mask is set on a per device and per queue basis in the sysfs variable
/sys/class/net//queues/rx-/rps_cpus. This is a set of canonical
bit maps for receive queues in the device (numbered by ). If a device
does not support multi-queue, a single variable is used for the device (rx-0).

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization. Optimal settings for the CPU mask
seem to depend on architectures and cache hierarcy. Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

e1000e on 8 core Intel
Without RPS: 108K tps at 33% CPU
With RPS: 311K tps at 64% CPU

forcedeth on 16 core AMD
Without RPS: 156K tps at 15% CPU
With RPS: 404K tps at 49% CPU

bnx2x on 16 core AMD
Without RPS 567K tps at 61% CPU (4 HW RX queues)
Without RPS 738K tps at 96% CPU (8 HW RX queues)
With RPS: 854K tps at 76% CPU (4 HW RX queues)

Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet. In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possibility of generating out of order packets. It's
probably best not change the masks too frequently.

Signed-off-by: Tom Herbert

include/linux/netdevice.h | 32 ++++-
include/linux/skbuff.h | 3 +
net/core/dev.c | 335 +++++++++++++++++++++++++++++++++++++--------
net/core/net-sysfs.c | 225 ++++++++++++++++++++++++++++++-
net/core/skbuff.c | 2 +
5 files changed, 538 insertions(+), 59 deletions(-)
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Tom Herbert
2010-03-17 12:23:18 +0800

20 Feb, 2010

1 commit

b8afe6416 net-sysfs: Use rtnl_trylock in wireless sysfs methods. ... Browse Code »

The wireless sysfs methods like the rest of the networking sysfs
methods are removed with the rtnl_lock held and block until
the existing methods stop executing. So use rtnl_trylock
and restart_syscall so that the code continues to work.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2010-02-20 07:40:51 +0800

26 Nov, 2009

1 commit

09ad9bc75 net: use net_eq to compare nets ... Browse Code »

Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.

Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller

Octavian Purdila
2009-11-26 07:14:13 +0800

31 Oct, 2009

1 commit

0c509a6c9 net: Allow devices to specify a device specific sysfs group. ... Browse Code »

This isn't beautifully abstracted, but it is simple,
simplifies uses and so far is only needed for the bonding driver.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2009-10-31 03:41:18 +0800

28 Oct, 2009

1 commit

ac5e3af99 net: sysfs: ethtool_ops can be NULL ... Browse Code »

commit d519e17e2d01a0ee9abe083019532061b4438065
(net: export device speed and duplex via sysfs)
made the wrong assumption that netdev->ethtool_ops was always set.

This makes possible to crash kernel and let rtnl in locked state.

modprobe dummy
ip link set dummy0 up
(udev runs and crash)

Signed-off-by: Eric Dumazet
Acked-by: Andy Gospodarek
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-28 18:59:46 +0800