Eric Lee / smarc-fsl-linux-kernel

09 Dec, 2009

1 commit

9327f7053 tcp: Fix a connect() race with timewait sockets ... Browse Code »

First patch changes __inet_hash_nolisten() and __inet6_hash()
to get a timewait parameter to be able to unhash it from ehash
at same time the new socket is inserted in hash.

This makes sure timewait socket wont be found by a concurrent
writer in __inet_check_established()

Reported-by: kapil dakhane
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-12-09 12:17:51 +0800

14 Nov, 2009

1 commit

5256f2ef3 inet: fix inet_bind_bucket_for_each ... Browse Code »

The first "node" is supposed to be the cursor used in the for_each.

The second "node" is ment literally and should not be macro expanded:
it's the name of the hlist_node field from the inet_bind_bucket.

This currently works because when inet_bind_bucket_for_each is called
it's argument is still "node".

Signed-off-by: Lucian Adrian Grijincu
Signed-off-by: David S. Miller

Lucian Adrian Grijincu
2009-11-14 12:46:56 +0800

19 Oct, 2009

1 commit

c720c7e83 inet: rename some inet_sock fields ... Browse Code »

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-19 09:52:53 +0800

13 Oct, 2009

1 commit

f373b53b5 tcp: replace ehash_size by ehash_mask ... Browse Code »

Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-13 18:44:02 +0800

03 Jun, 2009

1 commit

adf30907d net: skb->dst accessors ... Browse Code »

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-03 17:51:04 +0800

02 Feb, 2009

1 commit

24dd1fa18 net: move bsockets outside of read only beginning of struct inet_hashinfo ... Browse Code »

And switch bsockets to atomic_t since it might be changed in parallel.

Signed-off-by: Eric Dumazet
Acked-by: Evgeniy Polyakov
Signed-off-by: David S. Miller

Eric Dumazet
2009-02-02 04:31:33 +0800

31 Jan, 2009

1 commit

05bee4737 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/e1000/e1000_main.c

David S. Miller
2009-01-31 06:31:07 +0800

28 Jan, 2009

1 commit

94cd3e6cb net: wrong test in inet_ehash_locks_alloc() ... Browse Code »

In commit 9db66bdcc83749affe61c61eb8ff3cf08f42afec (net: convert
TCP/DCCP ehash rwlocks to spinlocks), I forgot to change one
occurrence of rwlock_t to spinlock_t

I believe sizeof(raw_spinlock_t) might be > 0 on !CONFIG_SMP if
CONFIG_DEBUG_SPINLOCK while sizeof(raw_rwlock_t) should be 0 in this
case.

Fortunatly, CONFIG_DEBUG_SPINLOCK adds fields to both spinlock_t and
rwlock_t, but at this might change in the future (being able to debug
spinlocks but not rwlocks for example), better to be safe.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-01-28 09:45:10 +0800

22 Jan, 2009

1 commit

a9d8f9110 inet: Allowing more than 64k connections and heavily optimize bind(0) time. ... Browse Code »

With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.

Signed-off-by: Evgeniy Polyakov
Tested-by: Denys Fedoryschenko
Signed-off-by: David S. Miller

Evgeniy Polyakov
2009-01-22 06:34:31 +0800

24 Nov, 2008

1 commit

c25eb3bfb net: Convert TCP/DCCP listening hash tables to use RCU ... Browse Code »

This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.

The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.

We define a large value :

#define LISTENING_NULLS_BASE (1U << 29)

So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-24 09:22:55 +0800

22 Nov, 2008

1 commit

f757fec4b net: use net_eq() in INET_MATCH and INET_TW_MATCH ... Browse Code »

We can avoid some useless instructions if !CONFIG_NET_NS

Because of RCU, we use INET_MATCH or INET_TW_MATCH twice for the found
socket, so thats six instructions less per incoming TCP packet.

Yet another tbench speedup :)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-22 07:49:19 +0800

21 Nov, 2008

1 commit

9db66bdcc net: convert TCP/DCCP ehash rwlocks to spinlocks ... Browse Code »

Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.

/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.

This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-21 12:39:09 +0800

20 Nov, 2008

1 commit

5caea4ea7 net: listening_hash get a spinlock per bucket ... Browse Code »

This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.

listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.

This should reduce hold time of readers, and writers concurrency.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-20 16:40:07 +0800

17 Nov, 2008

1 commit

3ab5aee7f net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls ... Browse Code »

RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.

This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.

Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.

__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)

Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-17 11:40:17 +0800

12 Nov, 2008

1 commit

7a9546ee3 net: ib_net pointer should depends on CONFIG_NET_NS ... Browse Code »

We can shrink size of "struct inet_bind_bucket" by 50%, using
read_pnet() and write_pnet()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-12 16:54:20 +0800

08 Oct, 2008

2 commits

23542618d inet: Don't lookup the socket if there's a socket attached to the skb ... Browse Code »

Use the socket cached in the skb if it's present.

Signed-off-by: KOVACS Krisztian
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

KOVACS Krisztian
2008-10-08 03:41:01 +0800
9a1f27c48 inet_hashtables: Add inet_lookup_skb helpers ... Browse Code »

To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: KOVACS Krisztian
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2008-10-08 02:41:57 +0800

17 Jun, 2008

3 commits

0b4419162 netns: introduce the net_hash_mix "salt" for hashes ... Browse Code »

There are many possible ways to add this "salt", thus I made this
patch to be the last in the series to change it if required.

Currently I propose to use the struct net pointer itself as this
salt, but since this pointer is most often cache-line aligned, shift
this right to eliminate the bits, that are most often zeroed.

After this, simply add this mix to prepared hashfn-s.

For CONFIG_NET_NS=n case this salt is 0 and no changes in hashfn
appear.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-06-17 08:14:11 +0800
2086a6507 inet: add struct net argument to inet_lhashfn ... Browse Code »

Listening-on-one-port sockets in many namespaces produce long
chains in the listening_hash-es, so prepare the inet_lhashfn to
take struct net into account.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-06-17 08:13:08 +0800
7f635ab71 inet: add struct net argument to inet_bhashfn ... Browse Code »

Binding to some port in many namespaces may create too long
chains in bhash-es, so prepare the hashfn to take struct net
into account.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-06-17 08:12:49 +0800

18 Apr, 2008

2 commits

53083773d [INET]: Uninline the __inet_inherit_port call. ... Browse Code »

This deblats ~200 bytes when ipv6 and dccp are 'y'.

Besides, this will ease compilation issues for patches
I'm working on to make inet hash tables more scalable
wrt net namespaces.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-04-18 14:18:15 +0800
e56d8b8a2 [INET]: Drop the inet_inherit_port() call. ... Browse Code »

As I can see from the code, two places (tcp_v6_syn_recv_sock and
dccp_v6_request_recv_sock) that call this one already run with
BHs disabled, so it's safe to call __inet_inherit_port there.

Besides (in case I missed smth with code review) the calltrace
tcp_v6_syn_recv_sock
`- tcp_v4_syn_recv_sock
`- __inet_inherit_port
and the similar for DCCP are valid, but assumes BHs to be disabled.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-04-18 14:17:34 +0800

26 Mar, 2008

1 commit

3b1e0a655 [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. ... Browse Code »

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2008-03-26 03:39:55 +0800

23 Mar, 2008

1 commit

39d8cda76 [SOCK]: Add udp_hash member to struct proto. ... Browse Code »

Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4
and -v6 protocols.

The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.

The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc
warning about inability to initialize anonymous member this way.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-03-23 07:50:58 +0800

05 Feb, 2008

1 commit

5d8c0aa94 [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations. ... Browse Code »

The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit

5ee31fc1ecdcbc234c8c56dcacef87c8e09909d8
[INET]: Consolidate inet(6)_hash_connect.

Return this logic back, by passing the port offset directly into the
consolidated function.

Signed-off-by: Pavel Emelyanov
Noticed-by: Adrian Bunk
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-02-05 19:14:44 +0800

03 Feb, 2008

1 commit

ab1e0a13d [SOCK] proto: Add hashinfo member to struct proto ... Browse Code »

This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash: inet_hash is directly used, only v6 need
a specific version to deal with mapped sockets
sk->sk_prot->unhash: both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
struct proto | +8
struct inet_connection_sock_af_ops | +8
2 structs changed
__inet_hash_nolisten | +18
__inet_hash | -210
inet_put_port | +8
inet_bind_bucket_create | +1
__inet_hash_connect | -8
5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
proto_seq_show | +3
1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
inet_csk_get_port | +15
1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
tcp_set_state | -7
1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
tcp_v4_get_port | -31
tcp_v4_hash | -48
tcp_v4_destroy_sock | -7
tcp_v4_syn_recv_sock | -2
tcp_unhash | -179
5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
__inet6_hash | +8
1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
inet_unhash | +190
inet_hash | +242
2 functions changed, 432 bytes added, diff: +432

vmlinux:
16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
tcp_v6_get_port | -31
tcp_v6_hash | -7
tcp_v6_syn_recv_sock | -9
3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
dccp_destroy_sock | -7
dccp_unhash | -179
dccp_hash | -49
dccp_set_state | -7
dccp_done | +1
5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
dccp_v4_get_port | -31
dccp_v4_request_recv_sock | -2
2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
dccp_v6_get_port | -31
dccp_v6_hash | -7
dccp_v6_request_recv_sock | +5
3 functions changed, 5 bytes added, 38 bytes removed, diff: -33

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2008-02-03 20:28:52 +0800

01 Feb, 2008

3 commits

c67499c0e [NETNS]: Tcp-v4 sockets per-net lookup. ... Browse Code »

Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.

The dccp and inet_diag, which use that lookup functions
pass the init_net into them.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-02-01 11:28:19 +0800
941b1d22c [NETNS]: Make bind buckets live in net namespaces. ... Browse Code »

This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.

A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-02-01 11:28:18 +0800
5ee31fc1e [INET]: Consolidate inet(6)_hash_connect. ... Browse Code »

These two functions are the same except for what they call
to "check_established" and "hash" for a socket.

This saves half-a-kilo for ipv4 and ipv6.

add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
function old new delta
__inet_hash_connect - 577 +577
arp_ignore 108 113 +5
static.hint 8 4 -4
rt_worker_func 376 372 -4
inet6_hash_connect 584 25 -559
inet_hash_connect 586 25 -561

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-02-01 11:28:17 +0800

29 Jan, 2008

3 commits

65f765178 [NET]: prot_inuse cleanups and optimizations ... Browse Code »

1) Cleanups (all functions are prefixed by sock_prot_inuse)

sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse() -> sock_prot_inuse_get()

New functions :

sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.

2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.

This saves 1372 bytes on i386/SMP and some cpu cycles.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-01-29 07:00:36 +0800
77a5ba55d [INET]: Uninline the __inet_lookup_established function. ... Browse Code »

This is -700 bytes from the net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700)
function old new delta
__inet_lookup_established - 339 +339
tcp_sacktag_write_queue 2254 2255 +1
tcp_v4_err 1304 973 -331
tcp_v4_rcv 2089 1744 -345
tcp_v4_do_rcv 826 462 -364

Exporting is for dccp module (used via e.g. inet_lookup).

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 06:59:27 +0800
152da81de [INET]: Uninline the __inet_hash function. ... Browse Code »

This one is used in quite many places in the networking code and
seems to big to be inline.

After the patch net/ipv4/build-in.o loses ~650 bytes:
add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
function old new delta
__inet_hash_nolisten - 282 +282
__inet_hash - 179 +179
tcp_sacktag_write_queue 2255 2254 -1
__inet_lookup_listener 284 274 -10
tcp_v4_syn_recv_sock 755 493 -262
tcp_v4_hash 389 35 -354
inet_hash_connect 1086 599 -487

This version addresses the issue pointed by Eric, that
while being inline this function was optimized by gcc
in respect to the 'listen_possible' argument.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2008-01-29 06:59:26 +0800

26 Nov, 2007

1 commit

218ad12f4 [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on ... Browse Code »

The inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
if (size > PAGE_SIZE)
x = vmalloc(...);
else
#endif
x = kmalloc(...);

Unlike it, the inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
if (size > PAGE_SIZE)
vfree(x);
else
#else
kfree(x);
#endif

The error is obvious - if the NUMA is on and the size
is less than the PAGE_SIZE we leak the pointer (kfree is
inside the #else branch).

Compiler doesn't warn us because after the kfree(x) there's
a "x = NULL" assignment, so here's another (minor?) bug: we
don't set x to NULL under certain circumstances.

Boring explanation, I know... Patch explains it better.

Signed-off-by: Pavel Emelyanov
Signed-off-by: Herbert Xu

Pavel Emelyanov
2007-11-26 20:23:31 +0800

11 Nov, 2007

1 commit

9e4505c45 [INET]: Add a missing include <linux/vmalloc.h> to inet_hashtables.h ... Browse Code »

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2007-11-11 13:18:39 +0800

07 Nov, 2007

1 commit

230140cff [INET]: Remove per bucket rwlock in tcp/dccp ehash table. ... Browse Code »

As done two years ago on IP route cache table (commit
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Eric Dumazet
2007-11-07 20:15:11 +0800

26 Oct, 2007

1 commit

fee9dee73 [UDP]: Make use of inet_iif() when doing socket lookups. ... Browse Code »

UDP currently uses skb->dev->ifindex which may provide the wrong
information when the socket bound to a specific interface.
This patch makes inet_iif() accessible to UDP and makes UDP use it.

The scenario we are trying to fix is when a client is running on
the same system and the server and both client and server bind to
a non-loopback device.

Signed-off-by: Vlad Yasevich
Acked-by: David L Stevens
Signed-off-by: David S. Miller

Vlad Yasevich
2007-10-26 09:54:46 +0800

11 Oct, 2007

1 commit

cfcabdcc2 [NET]: sparse warning fixes ... Browse Code »

Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2007-10-11 07:54:48 +0800

09 Feb, 2007

1 commit

dbca9b275 [NET]: change layout of ehash table ... Browse Code »

ehash table layout is currently this one :

First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.

This is non optimal because of for a given hash or socket, the two chain heads
are located in separate cache lines.
Moreover the locks of the second half are never used.

If instead of this halving, we use two list heads in inet_ehash_bucket instead
of only one, we probably can avoid one cache miss, and reduce ram usage,
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK,
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep
together related chains to speedup lookups and socket state change.

In this patch I did not try to align struct inet_ehash_bucket, but a future
patch could try to make this structure have a convenient size (a power of two
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so
maybe we dont need to scratch our heads to align the bucket...

Note : In case struct inet_ehash_bucket is not a power of two, we could
probably change alloc_large_system_hash() (in case it use __get_free_pages())
to free the unused space. It currently allocates a big zone, but the last
quarter of it could be freed. Again, this should be a temporary 'problem'.

Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2007-02-09 06:16:46 +0800

08 Dec, 2006

1 commit

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800

29 Sep, 2006

1 commit

fb99c848e [IPV4]: annotate inet_lookup() and friends ... Browse Code »

inet_lookup() annotated along with helper functions (__inet_lookup(),
__inet_lookup_established(), inet_lookup_established(),
inet_lookup_listener(), __inet_lookup_listener() and inet_ehashfn())

Signed-off-by: Al Viro
Signed-off-by: David S. Miller

Al Viro
2006-09-29 09:02:26 +0800