Eric Lee / smarc-fsl-linux-kernel

01 Aug, 2011

1 commit

31daf0393 dccp ccid-2: use feature-negotiation to report Ack Ratio changes ... Browse Code »

This uses the new feature-negotiation framework to signal Ack Ratio changes,
as required by RFC 4341, sec. 6.1.2.

That raises some problems with CCID-2, which at the moment can not cope
gracefully with Ack Ratios > 1. Since these issues are not directly related
to feature negotiation, they are marked by a FIXME.

Signed-off-by: Gerrit Renker
Signed-off-by: Samuel Jero
Acked-by: Ian McDonald

Gerrit Renker
2011-08-01 21:52:35 +0800

07 Dec, 2010

2 commits

049102650 dccp qpolicy: Parameter checking of cmsg qpolicy parameters ... Browse Code »

Ensure that cmsg->cmsg_type value is valid for qpolicy
that is currently in use.

Signed-off-by: Tomasz Grobelny
Signed-off-by: Gerrit Renker

Tomasz Grobelny
2010-12-07 20:47:12 +0800
871a2c16c dccp: Policy-based packet dequeueing infrastructure ... Browse Code »

This patch adds a generic infrastructure for policy-based dequeueing of
TX packets and provides two policies:
* a simple FIFO policy (which is the default) and
* a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options).

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.

Signed-off-by: Tomasz Grobelny
Signed-off-by: Gerrit Renker

Tomasz Grobelny
2010-12-07 20:47:12 +0800

29 Oct, 2010

1 commit

b1fcf55ee dccp: Refine the wait-for-ccid mechanism ... Browse Code »

This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID, addressing the following problems:

1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
example has a full TX queue and becomes network-limited just as the
application wants to close, then waiting for CCID-2 to become unblocked
could lead to an indefinite delay (i.e., application "hangs").
2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
in its sending policy while the queue is being drained. This can lead to
further delays during which the application will not be able to terminate.
3) The minimum wait time for CCID-3/4 can be expected to be the queue length
times the current inter-packet delay. For example if tx_qlen=100 and a delay
of 15 ms is used for each packet, then the application would have to wait
for a minimum of 1.5 seconds before being allowed to exit.
4) There is no way for the user/application to control this behaviour. It would
be good to use the timeout argument of dccp_close() as an upper bound. Then
the maximum time that an application is willing to wait for its CCIDs to can
be set via the SO_LINGER option.

These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.

The wait-for-ccid function is, as before, used when the application
(a) has read all the data in its receive buffer and
(b) if SO_LINGER was set with a non-zero linger time, or
(c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
state (client application closes after receiving CloseReq).

In addition, there is a catch-all case of __skb_queue_purge() after waiting for
the CCID. This is necessary since the write queue may still have data when
(a) the host has been passively-closed,
(b) abnormal termination (unread data, zero linger time),
(c) wait-for-ccid could not finish within the given time limit.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2010-10-29 01:27:01 +0800

12 Oct, 2010

1 commit

2f34b3297 dccp: cosmetics - warning format ... Browse Code »

This omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
echoes the function name, avoiding messages like

kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread

Signed-off-by: Gerrit Renker

Gerrit Renker
2010-10-12 12:57:43 +0800

07 Oct, 2010

1 commit

1f4f0f645 dccp: Kill dead code and add static markers. ... Browse Code »

Remove dead code and make some functions static.
Compile tested only.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
2010-10-07 14:12:07 +0800

26 Jun, 2010

1 commit

1823e4c80 snmp: add align parameter to snmp_mib_init() ... Browse Code »

In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.

Callers can use __alignof__(type) to provide correct
alignment.

Signed-off-by: Eric Dumazet
CC: Herbert Xu
CC: Arnaldo Carvalho de Melo
CC: Hideaki YOSHIFUJI
CC: Vlad Yasevich
Signed-off-by: David S. Miller

Eric Dumazet
2010-06-26 12:33:17 +0800

31 May, 2010

1 commit

042604d2a net/dccp: Use memdup_user ... Browse Code »

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

//
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

- to = $kmalloc@p\|kzalloc@p$(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {

}
- if (copy_from_user(to, from, size) != 0) {
-
- }
//

Signed-off-by: Julia Lawall
Acked-by: Gerrit Renker
Signed-off-by: David S. Miller

Julia Lawall
2010-05-31 15:24:14 +0800

21 Apr, 2010

1 commit

aa3951451 net: sk_sleep() helper ... Browse Code »

Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-21 07:37:13 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

16 Mar, 2010

1 commit

d14a0ebda net-2.6 [Bug-Fix][dccp]: fix oops caused after failed initialisation ... Browse Code »

dccp: fix panic caused by failed initialisation

This fixes a kernel panic reported thanks to Andre Noll:

if DCCP is compiled into the kernel and any out of the initialisation
steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
from creating DCCP sockets.

This patch fixes the problem by propagating a failure in dccp_init() to
dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
DCCP protocol is not made available if its initialisation fails.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2010-03-16 07:00:50 +0800

17 Feb, 2010

1 commit

7d720c3e4 percpu: add __percpu sparse annotations to net ... Browse Code »

Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).

Signed-off-by: Tejun Heo
Acked-by: David S. Miller
Cc: Patrick McHardy
Cc: Arnaldo Carvalho de Melo
Cc: Vlad Yasevich
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Tejun Heo
2010-02-17 15:05:38 +0800

13 Feb, 2010

1 commit

55d955902 dccp: support for passing MSG_TRUNC ... Browse Code »

DCCP is datagram-oriented but lacks UDP's support for MSG_TRUNC as defined in
recvmsg(2)/recv(2). Hence the following 'Hello world\0' receiver

len = recv(fd, buf, 10, MSG_PEEK | MSG_TRUNC);

wrongly (always) returns 10, while in UDP it returns 12 as expected.
This patch adds the missing MSG_TRUNC support to recvmsg().

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2010-02-13 08:51:10 +0800

19 Oct, 2009

1 commit

c720c7e83 inet: rename some inet_sock fields ... Browse Code »

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-19 09:52:53 +0800

13 Oct, 2009

1 commit

f373b53b5 tcp: replace ehash_size by ehash_mask ... Browse Code »

Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-13 18:44:02 +0800

01 Oct, 2009

1 commit

b7058842c net: Make setsockopt() optlen be unsigned. ... Browse Code »

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller

David S. Miller
2009-10-01 07:12:20 +0800

22 Sep, 2009

1 commit

4481374ce mm: replace various uses of num_physpages by totalram_pages ... Browse Code »

Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.

Some of the calculations (i.e. those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).

Signed-off-by: Jan Beulich
Acked-by: Rusty Russell
Acked-by: Ingo Molnar
Cc: Dave Airlie
Cc: Kyle McMartin
Cc: Jeremy Fitzhardinge
Cc: Pekka Enberg
Cc: Hugh Dickins
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2009-09-22 22:17:38 +0800

13 Aug, 2009

1 commit

aa11d958d Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
arch/microblaze/include/asm/socket.h

David S. Miller
2009-08-13 08:44:53 +0800

10 Aug, 2009

1 commit

f222e8b40 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Browse Code »

David S. Miller
2009-08-10 12:29:47 +0800

06 Aug, 2009

2 commits

36cbd3dcc net: mark read-only arrays as const ... Browse Code »

String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.

Signed-off-by: Jan Engelhardt
Signed-off-by: David S. Miller

Jan Engelhardt
2009-08-06 01:42:58 +0800
476181cb0 dccp: missing destroy of percpu counter variable while unload module ... Browse Code »

percpu counter dccp_orphan_count is init in dccp_init() by
percpu_counter_init() while dccp module is loaded, but the
destroy of it is missing while dccp module is unloaded. We
can get the kernel WARNING about this. Reproduct by the
following commands:

$ modprobe dccp
$ rmmod dccp
$ modprobe dccp

WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
Hardware name: VMware Virtual Platform
list_add corruption. next->prev should be prev (c080c0c4), but was (null). (next
=ca7188cc).
Modules linked in: dccp(+) nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc
Pid: 1956, comm: modprobe Not tainted 2.6.31-rc5 #55
Call Trace:
[] warn_slowpath_common+0x6a/0x81
[] ? __list_add+0x27/0x5c
[] warn_slowpath_fmt+0x29/0x2c
[] __list_add+0x27/0x5c
[] __percpu_counter_init+0x4d/0x5d
[] dccp_init+0x19/0x2ed [dccp]
[] do_one_initcall+0x4f/0x111
[] ? dccp_init+0x0/0x2ed [dccp]
[] ? notifier_call_chain+0x26/0x48
[] ? __blocking_notifier_call_chain+0x45/0x51
[] sys_init_module+0xac/0x1bd
[] sysenter_do_call+0x12/0x22

Signed-off-by: Wei Yongjun
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Wei Yongjun
2009-08-06 01:22:03 +0800

30 Jul, 2009

1 commit

1c29b3ff4 net-dccp: suppress warning about large allocations from DCCP ... Browse Code »

The DCCP protocol tries to allocate some large hash tables during
initialisation using the largest size possible. This can be larger than
what the page allocator can provide so it prints a warning. However, the
caller is able to handle the situation so this patch suppresses the
warning.

Signed-off-by: Mel Gorman
Acked-by: Arnaldo Carvalho de Melo
Cc: "David S. Miller"
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-07-30 10:10:36 +0800

10 Jul, 2009

1 commit

a57de0b43 net: adding memory barrier to the poll and receive callbacks ... Browse Code »

Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1 CPU2

sys_select receive packet
... ...
__add_wait_queue update tp->rcv_nxt
... ...
tp->rcv_nxt check sock_def_readable
... {
schedule ...
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible(sk->sk_sleep)
...
}

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
net/bluetooth/af_bluetooth.c
net/irda/af_irda.c
net/irda/irnet/irnet_ppp.c
net/mac80211/rc80211_pid_debugfs.c
net/phonet/socket.c
net/rds/af_rds.c
net/rfkill/core.c
net/sunrpc/cache.c
net/sunrpc/rpc_pipe.c
net/tipc/socket.c

Signed-off-by: Jiri Olsa
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Jiri Olsa
2009-07-10 08:06:57 +0800

22 Jan, 2009

1 commit

792b48780 dccp: Implement both feature-local and feature-remote Sequence Window feature ... Browse Code »

This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.

Specifically, the following is contained in this patch:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2009-01-22 06:34:04 +0800

05 Jan, 2009

1 commit

ddebc973c dccp: Lockless integration of CCID congestion-control plugins ... Browse Code »

Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

* enables a faster connection path by eliminating the need to always go
through the CCID registration lock;

* updates the implementation to use only a single array whose size equals
the number of configured CCIDs instead of the maximum (256);

* since the CCIDs are now fixed array elements, synchronization is no
longer needed, simplifying use and implementation.

CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2009-01-05 13:42:53 +0800

30 Dec, 2008

1 commit

eb4dea585 net: Fix percpu counters deadlock ... Browse Code »

When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.

Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.

Reported-by: Jeff Kirsher
Tested-by: Ingo Molnar
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2008-12-30 15:04:08 +0800

08 Dec, 2008

2 commits

6fdd34d43 dccp ccid-2: Phase out the use of boolean Ack Vector sysctl ... Browse Code »

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, as it now is handled fully dynamically via feature negotiation
(i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.

There was a FIXME to change the error code when dccp_ackvec_add() fails.
I removed this after finding out that:
* the check whether ackno < ISN is already made earlier,
* this Response is likely the 1st packet with an Ackno that the client gets,
* so when dccp_ackvec_add() fails, the reason is likely not a packet error.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-12-08 17:19:06 +0800
6eb55d172 dccp: Integration of dynamic feature activation - part 1 (socket setup) ... Browse Code »

This first patch out of three replaces the hardcoded default settings with
initialisation code for the dynamic feature negotiation.

The patch also ensures that the client feature-negotiation queue is flushed
only when entering the OPEN state.

Since confirmed Change options are removed as soon as they are confirmed
(in the DCCP-Response), this ensures that Confirm options are retransmitted.

Note on retransmitting Confirm options:
---------------------------------------
Implementation experience showed that it is necessary to retransmit Confirm
options. Thanks to Leandro Melo de Sales who reported a bug in an earlier
revision of the patch set, resulting from not retransmitting these options.

As long as the client is in PARTOPEN, it needs to retransmit the Confirm
options for the Change options received on the DCCP-Response from the server.

Otherwise, if the packet containing the Confirm options gets dropped in the
network, the connection aborts due to undefined feature negotiation state.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-12-08 17:15:26 +0800

26 Nov, 2008

1 commit

dd24c0019 net: Use a percpu_counter for orphan_count ... Browse Code »

Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-26 13:17:14 +0800

24 Nov, 2008

2 commits

71c262a3d dccp: API to query the current TX/RX CCID ... Browse Code »

This provides function to query the current TX/RX CCID dynamically,
without reliance on the minisock value, using dynamic information
available in the currently loaded CCID module.

This query function is then used to
(a) provide the getsockopt part for getting/setting CCIDs via sockopts;
(b) replace the current test for "which CCID is in use" in probe.c.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-24 08:04:59 +0800
b20a9c24d dccp: Set per-connection CCIDs via socket options ... Browse Code »

With this patch, TX/RX CCIDs can now be changed on a per-connection
basis, which overrides the defaults set by the global sysctl variables
for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch
set are needed, which track dependencies and activate negotiated
feature values.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-24 08:02:31 +0800

20 Nov, 2008

1 commit

5caea4ea7 net: listening_hash get a spinlock per bucket ... Browse Code »

This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.

listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.

This should reduce hold time of readers, and writers concurrency.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-20 16:40:07 +0800

17 Nov, 2008

4 commits

191029963 dccp: Tidy up setsockopt calls ... Browse Code »

This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.

Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.

The second switch-case statement now only has those statements which need
locking and which make use of `val'.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Acked-by: Arnaldo Carvalho de Melo
Reviewed-by: Eugene Teo
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-17 14:56:55 +0800
294505598 dccp: Feature negotiation for minimum-checksum-coverage ... Browse Code »

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-17 14:53:48 +0800
49aebc66d dccp: Deprecate old setsockopt framework ... Browse Code »

The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation.
These are essentially wrappers around the internal __feat_register functions,
with checking added to avoid

* wrong usage (type);
* changing values while the connection is in progress.

Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-17 14:51:23 +0800
3ab5aee7f net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls ... Browse Code »

RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.

This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.

Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.

__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)

Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2008-11-17 11:40:17 +0800

12 Nov, 2008

3 commits

9eca0a47d dccp: Resolve dependencies of features on choice of CCID ... Browse Code »

This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
* since CCID2 mandates the use of Ack Vectors, there is no point in allowing
endpoints which use CCID2 to disable Ack Vector features such a connection;

* a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

* for all other CCIDs, the use of (Send) Ack Vector is optional and thus
negotiable. However, this implies that the code negotiating the use of Ack
Vectors also supports it (i.e. is able to supply and to either parse or
ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
Vector support), the use of Ack Vectors is here disabled, with a comment
in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-12 16:48:44 +0800
d90ebcbfa dccp: Query supported CCIDs ... Browse Code »

This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
- a test function for internal use which is used to validate CCID requests
made by the user;
- a copy function so that the list can be used for feature-negotiation;
- documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
will not fail if the peer requests to use CCID3 instead of CCID2.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-12 16:47:26 +0800
e8ef967a5 dccp: Registration routines for changing feature values ... Browse Code »

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-12 16:43:40 +0800

05 Nov, 2008

1 commit

d99a7bd21 dccp: Cleanup routines for feature negotiation ... Browse Code »

This inserts the required de-allocation routines for memory allocated
by feature negotiation in the socket destructors, replacing
dccp_feat_clean() in one instance.

Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
Signed-off-by: David S. Miller

Gerrit Renker
2008-11-05 15:56:30 +0800