Doug / smarc-fsl-linux-kernel | Embedian Git Server

02 Jun, 2012

1 commit

7433819a1 tcp: do not create inetpeer on SYNACK message ... Browse Code »

Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
larger and larger, using lots of memory and cpu time.

tcp_v4_send_synack()
->inet_csk_route_req()
->ip_route_output_flow()
->rt_set_nexthop()
->rt_init_metrics()
->inet_getpeer( create = true)

This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for
TCP) added in 2.6.39

Possible solution :

Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS

Before patch :

# grep peer /proc/slabinfo
inet_peer_cache 4175430 4175430 192 42 2 : tunables 0 0 0 : slabdata 99415 99415 0

Samples: 41K of event 'cycles', Event count
+ 20,24% ksoftirqd/0 [kernel.kallsyms]
+ 8,19% ksoftirqd/0 [kernel.kallsyms]
+ 4,81% ksoftirqd/0 [kernel.kallsyms]
+ 3,64% ksoftirqd/0 [kernel.kallsyms]
+ 2,36% ksoftirqd/0 [ixgbe]
+ 2,16% ksoftirqd/0 [kernel.kallsyms]
+ 2,11% ksoftirqd/0 [kernel.kallsyms]
+ 2,11% ksoftirqd/0 [kernel.kallsyms]
+ 2,01% ksoftirqd/0 [kernel.kallsyms]
+ 1,83% ksoftirqd/0 [kernel.kallsyms]
+ 1,75% ksoftirqd/0 [kernel.kallsyms]
+ 1,49% ksoftirqd/0 [kernel.kallsyms]
+ 1,46% ksoftirqd/0 [kernel.kallsyms]
+ 1,45% ksoftirqd/0 [kernel.kallsyms]
+ 1,29% ksoftirqd/0 [kernel.kallsyms]
+ 1,29% ksoftirqd/0 [kernel.kallsyms]
+ 1,16% ksoftirqd/0 [kernel.kallsyms]
+ 1,15% ksoftirqd/0 [kernel.kallsyms]
+ 1,02% ksoftirqd/0 [kernel.kallsyms]
+ 0,93% ksoftirqd/0 [kernel.kallsyms]
+ 0,87% ksoftirqd/0 [kernel.kallsyms]
+ 0,84% ksoftirqd/0 [kernel.kallsyms]
+ 0,84% ksoftirqd/0 [kernel.kallsyms] (approx.): 30716565122 [k] inet_getpeer [k] peer_avl_rebalance.isra.1 [k] sha_transform [k] fib_table_lookup [k] ixgbe_poll [k] __ip_route_output_key [k] kernel_map_pages [k] ip_route_input_common [k] __inet_lookup_established [k] md5_transform [k] check_leaf.isra.9 [k] ipt_do_table [k] hrtimer_interrupt [k] kmem_cache_alloc [k] inet_csk_search_req [k] __netif_receive_skb [k] copy_user_generic_string [k] kmem_cache_free [k] tcp_make_synack [k] _raw_spin_lock_bh [k] __call_rcu [k] rt_garbage_collect [k] fib_rules_lookup

Signed-off-by: Eric Dumazet
Cc: Hans Schillstrom
Cc: Jesper Dangaard Brouer
Cc: Neal Cardwell
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2012-06-02 02:22:11 +0800

22 Apr, 2012

1 commit

4a17fd522 sock: Introduce named constants for sk_reuse ... Browse Code »

Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-04-22 03:52:25 +0800

16 Apr, 2012

1 commit

95c961747 net: cleanup unsigned to unsigned int ... Browse Code »

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-16 00:44:40 +0800

15 Apr, 2012

3 commits

aacd9289a tcp: bind() use stronger condition for bind_conflict ... Browse Code »

We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

We achieve this by adding a relaxation parameter to
inet_csk_bind_conflict. When 'relax' parameter is off
we return a conflict whenever the current searched
pair (addr, port) is not unique.

This tries to address the problems reported in patch:
8d238b25b1ec22a73b1c2206f111df2faaff8285
Revert "tcp: bind() fix when many ports are bound"

Tests where ran for creating and binding(0) many sockets
on 100 IPs. The results are, on average:

* 60000 sockets, 600 ports / IP:
* 0.210 s, 620 (IP, port) duplicates without patch
* 0.219 s, no duplicates with patch
* 100000 sockets, 1000 ports / IP:
* 0.371 s, 1720 duplicates without patch
* 0.373 s, no duplicates with patch
* 200000 sockets, 2000 ports / IP:
* 0.766 s, 6900 duplicates without patch
* 0.768 s, no duplicates with patch
* 500000 sockets, 5000 ports / IP:
* 2.227 s, 41500 duplicates without patch
* 2.284 s, no duplicates with patch

Signed-off-by: Alex Copot
Signed-off-by: Daniel Baluta
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Alex Copot
2012-04-15 03:28:55 +0800
c72e11833 inet: makes syn_ack_timeout mandatory ... Browse Code »

There are two struct request_sock_ops providers, tcp and dccp.

inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeout

Signed-off-by: Eric Dumazet
Cc: Gerrit Renker
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-15 03:24:26 +0800
fd4f2cead tcp: RFC6298 supersedes RFC2988bis ... Browse Code »

Updates some comments to track RFC6298

Signed-off-by: Eric Dumazet
Cc: H.K. Jerry Chu
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-15 03:24:26 +0800

26 Jan, 2012

2 commits

fddb7b576 tcp: bind() optimize port allocation ... Browse Code »

Port autoselection finds a port and then drop the lock,
then right after that, gets the hash bucket again and lock it.

Fix it to go direct.

Signed-off-by: Flavio Leitner
Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Flavio Leitner
2012-01-26 10:50:43 +0800
2b05ad33e tcp: bind() fix autoselection to share ports ... Browse Code »

The current code checks for conflicts when the application
requests a specific port. If there is no conflict, then
the request is granted.

On the other hand, the port autoselection done by the kernel
fails when all ports are bound even when there is a port
with no conflict available.

The fix changes port autoselection to check if there is a
conflict and use it if not.

Signed-off-by: Flavio Leitner
Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Flavio Leitner
2012-01-26 10:50:43 +0800

12 Dec, 2011

1 commit

dfd56b8b3 net: use IS_ENABLED(CONFIG_IPV6) ... Browse Code »

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-12 07:25:16 +0800

09 Nov, 2011

1 commit

e56c57d0d net: rename sk_clone to sk_clone_lock ... Browse Code »

Make clear that sk_clone() and inet_csk_clone() return a locked socket.

Add _lock() prefix and kerneldoc.

Suggested-by: Linus Torvalds
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-09 06:07:07 +0800

24 May, 2011

1 commit

c4dbe54ed seqlock: Get rid of SEQLOCK_UNLOCKED ... Browse Code »

All static seqlock should be initialized with the lockdep friendly
__SEQLOCK_UNLOCKED() macro.

Remove legacy SEQLOCK_UNLOCKED() macro.

Signed-off-by: Eric Dumazet
Cc: David Miller
Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3E
Signed-off-by: Thomas Gleixner

Eric Dumazet
2011-05-24 21:22:17 +0800

19 May, 2011

1 commit

6bd023f3d ipv4: Make caller provide flowi4 key to inet_csk_route_req(). ... Browse Code »

This way the caller can get at the fully resolved fl4->{daddr,saddr}
etc.

Signed-off-by: David S. Miller

David S. Miller
2011-05-19 06:32:03 +0800

09 May, 2011

1 commit

77357a955 ipv4: Create inet_csk_route_child_sock(). ... Browse Code »

This is just like inet_csk_route_req() except that it operates after
we've created the new child socket.

In this way we can use the new socket's cork flow for proper route
key storage.

This will be used by DCCP and TCP child socket creation handling.

Signed-off-by: David S. Miller

David S. Miller
2011-05-09 05:34:22 +0800

29 Apr, 2011

2 commits

072d8c941 ipv4: Get route daddr from flow key in inet_csk_route_req(). ... Browse Code »

Now that output route lookups update the flow with
destination address selection, we can fetch it from
fl4->daddr instead of rt->rt_dst

Signed-off-by: David S. Miller

David S. Miller
2011-04-29 14:50:09 +0800
f6d8bd051 inet: add RCU protection to inet->opt ... Browse Code »

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet
Cc: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-29 04:16:35 +0800

19 Apr, 2011

1 commit

e1943424e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bnx2x/bnx2x_ethtool.c

David S. Miller
2011-04-19 15:21:33 +0800

14 Apr, 2011

1 commit

3e8c806a0 Revert "tcp: disallow bind() to reuse addr/port" ... Browse Code »

This reverts commit c191a836a908d1dd6b40c503741f91b914de3348.

It causes known regressions for programs that expect to be able to use
SO_REUSEADDR to shutdown a socket, then successfully rebind another
socket to the same ID.

Programs such as haproxy and amavisd expect this to work.

This should fix kernel bugzilla 32832.

Signed-off-by: David S. Miller

David S. Miller
2011-04-14 03:01:14 +0800

31 Mar, 2011

1 commit

e79d9bc7e ipv4: Use flowi4_init_output() in inet_connection_sock.c ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-31 19:53:20 +0800

13 Mar, 2011

4 commits

9cce96df5 net: Put fl4_* macros to struct flowi4 and use them again. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:54 +0800
9d6ec9380 ipv4: Use flowi4 in public route lookup interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:48 +0800
6281dcc94 net: Make flowi ports AF dependent. ... Browse Code »

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:46 +0800
1d28f42c1 net: Put flowi_* prefix on AF independent members of struct flowi ... Browse Code »

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:44 +0800

03 Mar, 2011

1 commit

b23dd4fe4 ipv4: Make output route lookup return rtable directly. ... Browse Code »

Instead of on the stack.

Signed-off-by: David S. Miller

David S. Miller
2011-03-03 06:31:35 +0800

02 Mar, 2011

2 commits

273447b35 ipv4: Kill can_sleep arg to ip_route_output_flow() ... Browse Code »

This boolean state is now available in the flow flags.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:27:04 +0800
420d44daa ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" ... Browse Code »

Since that is what the current vague "flags" argument means.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:19:23 +0800

12 Jan, 2011

1 commit

c191a836a tcp: disallow bind() to reuse addr/port ... Browse Code »

inet_csk_bind_conflict() logic currently disallows a bind() if
it finds a friend socket (a socket bound on same address/port)
satisfying a set of conditions :

1) Current (to be bound) socket doesnt have sk_reuse set
OR
2) other socket doesnt have sk_reuse set
OR
3) other socket is in LISTEN state

We should add the CLOSE state in the 3) condition, in order to avoid two
REUSEADDR sockets in CLOSE state with same local address/port, since
this can deny further operations.

Note : a prior patch tried to address the problem in a different (and
buggy) way. (commit fda48a0d7a8412ced tcp: bind() fix when many ports
are bound).

Reported-by: Gaspar Chilingarov
Reported-by: Daniel Baluta
Tested-by: Daniel Baluta
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-12 06:03:07 +0800

10 Dec, 2010

1 commit

68835aba4 net: optimize INET input path further ... Browse Code »

Followup of commit b178bb3dfc30 (net: reorder struct sock fields)

Optimize INET input path a bit further, by :

1) moving sk_refcnt close to sk_lock.

This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).

2) moving inet_daddr & inet_rcv_saddr at the beginning of sk

(same cache line than hash / family / bound_dev_if / nulls_node)

This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.

Before patch :

offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274

After patch :

offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4

compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-12-10 12:05:58 +0800

18 Nov, 2010

1 commit

5811662b1 net: use the macros defined for the members of flowi ... Browse Code »

Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-11-18 04:27:45 +0800

13 Jul, 2010

1 commit

4bc2f18ba net/ipv4: EXPORT_SYMBOL cleanups ... Browse Code »

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 03:57:54 +0800

11 Jun, 2010

1 commit

d8d1f30b9 net-next: remove useless union keyword ... Browse Code »

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Changli Gao
2010-06-11 14:31:35 +0800

16 May, 2010

1 commit

e3826f1e9 net: reserve ports for applications using fixed port numbers ... Browse Code »

(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila
Signed-off-by: WANG Cong
Cc: Neil Horman
Cc: Eric Dumazet
Cc: Eric W. Biederman
Signed-off-by: David S. Miller

Amerigo Wang
2010-05-16 14:28:40 +0800

03 May, 2010

1 commit

7ef527377 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Browse Code »

David S. Miller
2010-05-03 13:02:06 +0800

29 Apr, 2010

1 commit

8d238b25b Revert "tcp: bind() fix when many ports are bound" ... Browse Code »

This reverts two commits:

fda48a0d7a8412cedacda46a9c0bf8ef9cd13559
tcp: bind() fix when many ports are bound

and a follow-on fix for it:

6443bb1fc2050ca2b6585a3fa77f7833b55329ed
ipv6: Fix inet6_csk_bind_conflict()

It causes problems with binding listening sockets when time-wait
sockets from a previous instance still are alive.

It's too late to keep fiddling with this so late in the -rc
series, and we'll deal with it in net-next-2.6 instead.

Signed-off-by: David S. Miller

David S. Miller
2010-04-29 02:25:59 +0800

28 Apr, 2010

1 commit

e1703b36c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/e100.c
drivers/net/e1000e/netdev.c

David S. Miller
2010-04-28 03:49:13 +0800

23 Apr, 2010

1 commit

fda48a0d7 tcp: bind() fix when many ports are bound ... Browse Code »

Port autoselection done by kernel only works when number of bound
sockets is under a threshold (typically 30000).

When this threshold is over, we must check if there is a conflict before
exiting first loop in inet_csk_get_port()

Change inet_csk_bind_conflict() to forbid two reuse-enabled sockets to
bind on same (address,port) tuple (with a non ANY address)

Same change for inet6_csk_bind_conflict()

Reported-by: Gaspar Chilingarov
Signed-off-by: Eric Dumazet
Acked-by: Evgeniy Polyakov
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-23 10:06:06 +0800

21 Apr, 2010

1 commit

aa3951451 net: sk_sleep() helper ... Browse Code »

Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-21 07:37:13 +0800

18 Jan, 2010

1 commit

72659ecce tcp: account SYN-ACK timeouts & retransmissions ... Browse Code »

Currently we don't increment SYN-ACK timeouts & retransmissions
although we do increment the same stats for SYN. We seem to have lost
the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
(commit 2248761e in the netdev-vger-cvs tree).

This patch fixes this issue. In the process we also rename the v4/v6
syn/ack retransmit functions for clarity. We also add a new
request_socket operations (syn_ack_timeout) so we can keep code in
inet_connection_sock.c protocol agnostic.

Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller

Octavian Purdila
2010-01-18 11:09:39 +0800

03 Dec, 2009

1 commit

e6b4d1136 TCPCT part 1a: add request_values parameter for sending SYNACK ... Browse Code »

Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.

Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

William Allen Simpson
2009-12-03 14:07:23 +0800

26 Nov, 2009

1 commit

09ad9bc75 net: use net_eq to compare nets ... Browse Code »

Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.

Signed-off-by: Octavian Purdila
Signed-off-by: David S. Miller

Octavian Purdila
2009-11-26 07:14:13 +0800

27 Oct, 2009

1 commit

cfadf853f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/sh_eth.c

David S. Miller
2009-10-27 16:03:26 +0800