Eric Lee / smarc-fsl-linux-kernel

13 Jul, 2010

1 commit

7ba429100 inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage() ... Browse Code »

a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.

Signed-off-by: Changli Gao
----
include/net/inet_common.h | 4 ++++
include/net/sock.h | 1 +
include/net/tcp.h | 8 ++++----
net/ipv4/af_inet.c | 15 +++++++++------
net/ipv4/tcp.c | 11 +++++------
net/ipv4/tcp_ipv4.c | 3 +++
net/ipv6/af_inet6.c | 8 ++++----
net/ipv6/tcp_ipv6.c | 3 +++
8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: David S. Miller

Changli Gao
2010-07-13 11:21:46 +0800

01 Jul, 2010

1 commit

4ce3c183f snmp: 64bit ipstats_mib for all arches ... Browse Code »
43

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
InOctets: 244068329096
OutOctets: 244069348848

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-01 04:31:19 +0800

26 Jun, 2010

1 commit

1823e4c80 snmp: add align parameter to snmp_mib_init() ... Browse Code »

In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.

Callers can use __alignof__(type) to provide correct
alignment.

Signed-off-by: Eric Dumazet
CC: Herbert Xu
CC: Arnaldo Carvalho de Melo
CC: Hideaki YOSHIFUJI
CC: Vlad Yasevich
Signed-off-by: David S. Miller

Eric Dumazet
2010-06-26 12:33:17 +0800

24 Jun, 2010

1 commit

7b2ff18ee net - IP_NODEFRAG option for IPv4 socket ... Browse Code »

this patch is implementing IP_NODEFRAG option for IPv4 socket.
The reason is, there's no other way to send out the packet with user
customized header of the reassembly part.

Signed-off-by: Jiri Olsa
Signed-off-by: David S. Miller

Jiri Olsa
2010-06-24 04:16:38 +0800

11 Jun, 2010

1 commit

d8d1f30b9 net-next: remove useless union keyword ... Browse Code »

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Changli Gao
2010-06-11 14:31:35 +0800

16 May, 2010

1 commit

e3826f1e9 net: reserve ports for applications using fixed port numbers ... Browse Code »

(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila
Signed-off-by: WANG Cong
Cc: Neil Horman
Cc: Eric Dumazet
Cc: Eric W. Biederman
Signed-off-by: David S. Miller

Amerigo Wang
2010-05-16 14:28:40 +0800

28 Apr, 2010

1 commit

c58dc01ba net: Make RFS socket operations not be inet specific. ... Browse Code »

Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.

Signed-off-by: David S. Miller
Acked-by: Eric Dumazet

David S. Miller
2010-04-28 06:11:48 +0800

21 Apr, 2010

2 commits

0eae88f31 net: Fix various endianness glitches ... Browse Code »

Sparse can help us find endianness bugs, but we need to make some
cleanups to be able to more easily spot real bugs.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-21 10:06:52 +0800
aa3951451 net: sk_sleep() helper ... Browse Code »

Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-21 07:37:13 +0800

17 Apr, 2010

1 commit

fec5e652e rfs: Receive Flow Steering ... Browse Code »

This patch implements receive flow steering (RFS). RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running. RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure. The rxhash is passed in skb's received on
the connection from netif_receive_skb. For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets. If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table. Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue. Each entry
contains a CPU and a tail queue counter. The CPU is the "current"
CPU for a matching flow. The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length. When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted. When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table. This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality. 2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS. The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue. Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors. On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation. However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp. The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
No RFS or RPS 104K tps at 30% CPU
No RFS (best RPS config): 290K tps at 63% CPU
RFS 303K tps at 61% CPU

RPC test tps CPU% 50/90/99% usec latency Latency StdDev
No RFS/RPS 103K 48% 757/900/3185 4472.35
RPS only: 174K 73% 415/993/2468 491.66
RFS 223K 73% 379/651/1382 315.61

Signed-off-by: Tom Herbert
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Tom Herbert
2010-04-17 07:01:27 +0800

13 Apr, 2010

1 commit

b6c6712a4 net: sk_dst_cache RCUification ... Browse Code »

With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.

sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)

This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))

This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-13 16:41:33 +0800

12 Apr, 2010

1 commit

871039f02 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/stmmac/stmmac_main.c
drivers/net/wireless/wl12xx/wl1271_cmd.c
drivers/net/wireless/wl12xx/wl1271_main.c
drivers/net/wireless/wl12xx/wl1271_spi.c
net/core/ethtool.c
net/mac80211/scan.c

David S. Miller
2010-04-12 05:53:53 +0800

07 Apr, 2010

1 commit

4a35ecf8b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bonding/bond_main.c
drivers/net/via-velocity.c
drivers/net/wireless/iwlwifi/iwl-agn.c

David S. Miller
2010-04-07 14:53:30 +0800

06 Apr, 2010

1 commit

cb4361c1d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
smc91c92_cs: fix the problem of "Unable to find hardware address"
r8169: clean up my printk uglyness
net: Hook up cxgb4 to Kconfig and Makefile
cxgb4: Add main driver file and driver Makefile
cxgb4: Add remaining driver headers and L2T management
cxgb4: Add packet queues and packet DMA code
cxgb4: Add HW and FW support code
cxgb4: Add register, message, and FW definitions
netlabel: Fix several rcu_dereference() calls used without RCU read locks
bonding: fix potential deadlock in bond_uninit()
net: check the length of the socket address passed to connect(2)
stmmac: add documentation for the driver.
stmmac: fix kconfig for crc32 build error
be2net: fix bug in vlan rx path for big endian architecture
be2net: fix flashing on big endian architectures
be2net: fix a bug in flashing the redboot section
bonding: bond_xmit_roundrobin() fix
drivers/net: Add missing unlock
net: gianfar - align BD ring size console messages
net: gianfar - initialize per-queue statistics
...

Linus Torvalds
2010-04-06 23:34:06 +0800

02 Apr, 2010

1 commit

6503d9616 net: check the length of the socket address passed to connect(2) ... Browse Code »

check the length of the socket address passed to connect(2).

Check the length of the socket address passed to connect(2). If the
length is invalid, -EINVAL will be returned.

Signed-off-by: Changli Gao
----
net/bluetooth/l2cap.c | 3 ++-
net/bluetooth/rfcomm/sock.c | 3 ++-
net/bluetooth/sco.c | 3 ++-
net/can/bcm.c | 3 +++
net/ieee802154/af_ieee802154.c | 3 +++
net/ipv4/af_inet.c | 5 +++++
net/netlink/af_netlink.c | 3 +++
7 files changed, 20 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller

Changli Gao
2010-04-02 08:26:01 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

22 Mar, 2010

1 commit

ec733b15a net: snmp mib cleanup ... Browse Code »

There is no point to align or pad mibs to cache lines, they are per cpu
allocated with a 8 bytes alignment anyway.
This wastes space for no gain. This patch removes __SNMP_MIB_ALIGN__

Since SNMP mibs contain "unsigned long" fields only, we can relax the
allocation alignment from "unsigned long long" to "unsigned long"

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-03-22 09:34:16 +0800

17 Feb, 2010

1 commit

7d720c3e4 percpu: add __percpu sparse annotations to net ... Browse Code »

Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).

Signed-off-by: Tejun Heo
Acked-by: David S. Miller
Cc: Patrick McHardy
Cc: Arnaldo Carvalho de Melo
Cc: Vlad Yasevich
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Tejun Heo
2010-02-17 15:05:38 +0800

06 Nov, 2009

3 commits

c84b3268d net: check kern before calling security subsystem ... Browse Code »

Before calling capable(CAP_NET_RAW) check if this operations is on behalf
of the kernel or on behalf of userspace. Do not do the security check if
it is on behalf of the kernel.

Signed-off-by: Eric Paris
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Eric Paris
2009-11-06 14:18:18 +0800
3f378b684 net: pass kern to net_proto_family create function ... Browse Code »

The generic __sock_create function has a kern argument which allows the
security system to make decisions based on if a socket is being created by
the kernel or by userspace. This patch passes that flag to the
net_proto_family specific create function, so it can do the same thing.

Signed-off-by: Eric Paris
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Eric Paris
2009-11-06 14:18:14 +0800
13f18aa05 net: drop capability from protocol definitions ... Browse Code »

struct can_proto had a capability field which wasn't ever used. It is
dropped entirely.

struct inet_protosw had a capability field which can be more clearly
expressed in the code by just checking if sock->type = SOCK_RAW.

Signed-off-by: Eric Paris
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Eric Paris
2009-11-06 13:40:17 +0800

29 Oct, 2009

1 commit

38bfd8f5b net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build time ... Browse Code »

proto_ops->getname implies copying protocol specific data
into storage unit (particulary to __kernel_sockaddr_storage).
So when we implement new protocol support we should keep such
a detail in mind (which is easy to forget about).

Lets introduce DECLARE_SOCKADDR helper which check if
storage unit is not overfowed at build time.

Eventually inet_getname is switched to use DECLARE_SOCKADDR
(to show example of usage).

Signed-off-by: Cyrill Gorcunov
Signed-off-by: David S. Miller

Cyrill Gorcunov
2009-10-29 18:00:06 +0800

19 Oct, 2009

1 commit

c720c7e83 inet: rename some inet_sock fields ... Browse Code »

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-19 09:52:53 +0800

07 Oct, 2009

1 commit

ec1b4cf74 net: mark net_proto_ops as const ... Browse Code »

All usages of structure net_proto_ops should be declared const.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2009-10-07 16:10:46 +0800

02 Oct, 2009

1 commit

914a9ab38 net: Use sk_mark for routing lookup in more places ... Browse Code »

This patch against v2.6.31 adds support for route lookup using sk_mark in some
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
lookup correctly if TCP sockets with SO_MARK were used.

Signed-off-by: Atis Elsts
Acked-by: Eric Dumazet

Atis Elsts
2009-10-02 06:16:49 +0800

15 Sep, 2009

1 commit

32613090a net: constify struct net_protocol ... Browse Code »

Remove long removed "inet_protocol_base" declaration.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2009-09-15 08:03:01 +0800

29 Aug, 2009

1 commit

3d1427f87 ipv4: af_inet.c cleanups ... Browse Code »

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-08-29 14:45:21 +0800

13 Jul, 2009

1 commit

d7ca4cc01 udpv4: Handle large incoming UDP/IPv4 packets and support software UFO. ... Browse Code »

- validate and forward GSO UDP/IPv4 packets from untrusted sources.
- do software UFO if the outgoing device doesn't support UFO.

Signed-off-by: Sridhar Samudrala
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Sridhar Samudrala
2009-07-13 05:29:21 +0800

04 Jun, 2009

1 commit

2307f866f ipv4: remove ip_mc_drop_socket() declaration from af_inet.c. ... Browse Code »

ip_mc_drop_socket() method is declared in linux/igmp.h, which
is included anyhow in af_inet.c. So there is no need for this declaration.
This patch removes it from af_inet.c.

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2009-06-04 12:43:26 +0800

02 Jun, 2009

1 commit

f771bef98 ipv4: New multicast-all socket option ... Browse Code »

After some discussion offline with Christoph Lameter and David Stevens
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.

This patch provides a new socket option IP_MULTICAST_ALL.

In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide
original behaviour. Sockets wishing to receive data only from
multicast groups they join explicitly will need to clear this
socket option.

Signed-off-by: Nivedita Singhvi
Signed-off-by: Christoph Lameter
Acked-by: David Stevens
Signed-off-by: David S. Miller

Nivedita Singhvi
2009-06-02 15:45:24 +0800

27 May, 2009

2 commits

1075f3f65 ipv4: Use 32-bit loads for ID and length in GRO ... Browse Code »

This patch optimises the IPv4 GRO code by using 32-bit loads
(instead of 16-bit ones) on the ID and length checks in the receive
function.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:26:02 +0800
a5b1cf288 gro: Avoid unnecessary comparison after skb_gro_header ... Browse Code »

For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL. Yet we must check it because of its current
form. This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:26:01 +0800

17 Apr, 2009

1 commit

573636cba [PATCH] net: remove superfluous call to synchronize_net() ... Browse Code »

inet_register_protosw() function is responsible for adding a new
inet protocol into a global table (inetsw[]) that is used with RCU rules.

As soon as the store of the pointer is done, other cpus might see
this new protocol in inetsw[], so we have to make sure new protocol
is ready for use. All pending memory updates should thus be committed
to memory before setting the pointer.
This is correctly done using rcu_assign_pointer()

synchronize_net() is typically used at unregister time, after
unsetting the pointer, to make sure no other cpu is still using
the object we want to dismantle. Using it at register time
is only adding an artificial delay that could hide a real bug,
and this bug could popup if/when synchronize_rcu() can proceed
faster than now.

This saves about 13 ms on boot time on a HZ=1000 8 cpus machine ;)
(4 calls to inet_register_protosw(), and about 3200 us per call)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-04-17 19:52:48 +0800

28 Mar, 2009

1 commit

6e15cf048 Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 ... Browse Code »

Conflicts:
arch/parisc/kernel/irq.c
arch/x86/include/asm/fixmap_64.h
arch/x86/include/asm/setup.h
kernel/irq/handle.c

Semantic merge:
arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-03-28 00:28:43 +0800

10 Mar, 2009

1 commit

7546dd97d net: convert usage of packet_type to read_mostly ... Browse Code »

Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2009-03-10 20:22:43 +0800

20 Feb, 2009

1 commit

313e458f8 alloc_percpu: add align argument to __alloc_percpu. ... Browse Code »

This prepares for a real __alloc_percpu, by adding an alignment argument.
Only one place uses __alloc_percpu directly, and that's for a string.

tj: af_inet also uses __alloc_percpu(), update it.

Signed-off-by: Rusty Russell
Cc: Christoph Lameter
Cc: Jens Axboe

Rusty Russell
2009-02-20 15:29:08 +0800

09 Feb, 2009

1 commit

a5ad24be7 gro: Optimise IPv4 packet reception ... Browse Code »

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch does some obvious changes to use 2-byte and 4-byte
operations instead of byte-oriented ones where possible. Bit
ops are also used to replace logical ops to reduce branching.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-02-09 12:22:19 +0800

02 Feb, 2009

1 commit

f15fbcd7d ipv4: Delete redundant sk_family assignment ... Browse Code »

sk_alloc now sets sk_family so this is redundant. In fact it caught
my eye because sock_init_data already uses sk_family so this is too
late anyway.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-02-02 14:24:43 +0800

01 Feb, 2009

1 commit

09640e636 net: replace uses of __constant_{endian} ... Browse Code »

Base versions handle constant folding now.

Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller

Harvey Harrison
2009-02-01 16:45:17 +0800

30 Jan, 2009

1 commit

86911732d gro: Avoid copying headers of unmerged packets ... Browse Code »

Unfortunately simplicity isn't always the best. The fraginfo
interface turned out to be suboptimal. The problem was quite
obvious. For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it. Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-01-30 08:33:03 +0800