01 Aug, 2011

1 commit

  • This uses the new feature-negotiation framework to signal Ack Ratio changes,
    as required by RFC 4341, sec. 6.1.2.

    That raises some problems with CCID-2, which at the moment can not cope
    gracefully with Ack Ratios > 1. Since these issues are not directly related
    to feature negotiation, they are marked by a FIXME.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Samuel Jero
    Acked-by: Ian McDonald

    Gerrit Renker
     

07 Dec, 2010

2 commits

  • Ensure that cmsg->cmsg_type value is valid for qpolicy
    that is currently in use.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     
  • This patch adds a generic infrastructure for policy-based dequeueing of
    TX packets and provides two policies:
    * a simple FIFO policy (which is the default) and
    * a priority based policy (set via socket options).
    Both policies honour the tx_qlen sysctl for the maximum size of the write
    queue (can be overridden via socket options).

    The priority policy uses skb->priority internally to assign an u32 priority
    identifier, using the same ranking as SO_PRIORITY. The skb->priority field
    is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
    data using cmsg(3), the patch also provides the requisite parsing routines.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     

29 Oct, 2010

1 commit

  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

12 Oct, 2010

1 commit

  • This omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
    echoes the function name, avoiding messages like

    kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

07 Oct, 2010

1 commit


26 Jun, 2010

1 commit

  • In preparation for 64bit snmp counters for some mibs,
    add an 'align' parameter to snmp_mib_init(), instead
    of assuming mibs only contain 'unsigned long' fields.

    Callers can use __alignof__(type) to provide correct
    alignment.

    Signed-off-by: Eric Dumazet
    CC: Herbert Xu
    CC: Arnaldo Carvalho de Melo
    CC: Hideaki YOSHIFUJI
    CC: Vlad Yasevich
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 May, 2010

1 commit

  • Use memdup_user when user data is immediately copied into the
    allocated region.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    expression from,to,size,flag;
    position p;
    identifier l1,l2;
    @@

    - to = \(kmalloc@p\|kzalloc@p\)(size,flag);
    + to = memdup_user(from,size);
    if (
    - to==NULL
    + IS_ERR(to)
    || ...) {

    }
    - if (copy_from_user(to, from, size) != 0) {
    -
    - }
    //

    Signed-off-by: Julia Lawall
    Acked-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Julia Lawall
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Mar, 2010

1 commit

  • dccp: fix panic caused by failed initialisation

    This fixes a kernel panic reported thanks to Andre Noll:

    if DCCP is compiled into the kernel and any out of the initialisation
    steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
    a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
    from creating DCCP sockets.

    This patch fixes the problem by propagating a failure in dccp_init() to
    dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
    DCCP protocol is not made available if its initialisation fails.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to net.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    The macro and type tricks around snmp stats make things a bit
    interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
    as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
    snmp_mib_*() users which used to cast the argument to (void **) are
    updated to cast it to (void __percpu **).

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Patrick McHardy
    Cc: Arnaldo Carvalho de Melo
    Cc: Vlad Yasevich
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Tejun Heo
     

13 Feb, 2010

1 commit

  • DCCP is datagram-oriented but lacks UDP's support for MSG_TRUNC as defined in
    recvmsg(2)/recv(2). Hence the following 'Hello world\0' receiver

    len = recv(fd, buf, 10, MSG_PEEK | MSG_TRUNC);

    wrongly (always) returns 10, while in UDP it returns 12 as expected.
    This patch adds the missing MSG_TRUNC support to recvmsg().

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Sep, 2009

1 commit

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

13 Aug, 2009

1 commit


10 Aug, 2009

1 commit


06 Aug, 2009

2 commits

  • String literals are constant, and usually, we can also tag the array
    of pointers const too, moving it to the .rodata section.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: David S. Miller

    Jan Engelhardt
     
  • percpu counter dccp_orphan_count is init in dccp_init() by
    percpu_counter_init() while dccp module is loaded, but the
    destroy of it is missing while dccp module is unloaded. We
    can get the kernel WARNING about this. Reproduct by the
    following commands:

    $ modprobe dccp
    $ rmmod dccp
    $ modprobe dccp

    WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
    Hardware name: VMware Virtual Platform
    list_add corruption. next->prev should be prev (c080c0c4), but was (null). (next
    =ca7188cc).
    Modules linked in: dccp(+) nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc
    Pid: 1956, comm: modprobe Not tainted 2.6.31-rc5 #55
    Call Trace:
    [] warn_slowpath_common+0x6a/0x81
    [] ? __list_add+0x27/0x5c
    [] warn_slowpath_fmt+0x29/0x2c
    [] __list_add+0x27/0x5c
    [] __percpu_counter_init+0x4d/0x5d
    [] dccp_init+0x19/0x2ed [dccp]
    [] do_one_initcall+0x4f/0x111
    [] ? dccp_init+0x0/0x2ed [dccp]
    [] ? notifier_call_chain+0x26/0x48
    [] ? __blocking_notifier_call_chain+0x45/0x51
    [] sys_init_module+0xac/0x1bd
    [] sysenter_do_call+0x12/0x22

    Signed-off-by: Wei Yongjun
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Yongjun
     

30 Jul, 2009

1 commit

  • The DCCP protocol tries to allocate some large hash tables during
    initialisation using the largest size possible. This can be larger than
    what the page allocator can provide so it prints a warning. However, the
    caller is able to handle the situation so this patch suppresses the
    warning.

    Signed-off-by: Mel Gorman
    Acked-by: Arnaldo Carvalho de Melo
    Cc: "David S. Miller"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

10 Jul, 2009

1 commit

  • Adding memory barrier after the poll_wait function, paired with
    receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
    to wrap the memory barrier.

    Without the memory barrier, following race can happen.
    The race fires, when following code paths meet, and the tp->rcv_nxt
    and __add_wait_queue updates stay in CPU caches.

    CPU1 CPU2

    sys_select receive packet
    ... ...
    __add_wait_queue update tp->rcv_nxt
    ... ...
    tp->rcv_nxt check sock_def_readable
    ... {
    schedule ...
    if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
    wake_up_interruptible(sk->sk_sleep)
    ...
    }

    If there was no cache the code would work ok, since the wait_queue and
    rcv_nxt are opposit to each other.

    Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
    passed the tp->rcv_nxt check and sleeps, or will get the new value for
    tp->rcv_nxt and will return with new data mask.
    In both cases the process (CPU1) is being added to the wait queue, so the
    waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

    The bad case is when the __add_wait_queue changes done by CPU1 stay in its
    cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
    endup calling schedule and sleep forever if there are no more data on the
    socket.

    Calls to poll_wait in following modules were ommited:
    net/bluetooth/af_bluetooth.c
    net/irda/af_irda.c
    net/irda/irnet/irnet_ppp.c
    net/mac80211/rc80211_pid_debugfs.c
    net/phonet/socket.c
    net/rds/af_rds.c
    net/rfkill/core.c
    net/sunrpc/cache.c
    net/sunrpc/rpc_pipe.c
    net/tipc/socket.c

    Signed-off-by: Jiri Olsa
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jiri Olsa
     

22 Jan, 2009

1 commit

  • This adds full support for local/remote Sequence Window feature, from which the
    * sequence-number-validity (W) and
    * acknowledgment-number-validity (W') windows
    derive as specified in RFC 4340, 7.5.3.

    Specifically, the following is contained in this patch:
    * integrated new socket fields into dccp_sk;
    * updated the update_gsr/gss routines with regard to these fields;
    * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
    * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
    * the `struct dccp_minisock' became empty and is now removed.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

05 Jan, 2009

1 commit

  • Based on Arnaldo's earlier patch, this patch integrates the standardised
    CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

    * enables a faster connection path by eliminating the need to always go
    through the CCID registration lock;

    * updates the implementation to use only a single array whose size equals
    the number of configured CCIDs instead of the maximum (256);

    * since the CCIDs are now fixed array elements, synchronization is no
    longer needed, simplifying use and implementation.

    CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
    CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

30 Dec, 2008

1 commit

  • When we converted the protocol atomic counters such as the orphan
    count and the total socket count deadlocks were introduced due to
    the mismatch in BH status of the spots that used the percpu counter
    operations.

    Based on the diagnosis and patch by Peter Zijlstra, this patch
    fixes these issues by disabling BH where we may be in process
    context.

    Reported-by: Jeff Kirsher
    Tested-by: Ingo Molnar
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

08 Dec, 2008

2 commits

  • This removes the use of the sysctl and the minisock variable for the Send Ack
    Vector feature, as it now is handled fully dynamically via feature negotiation
    (i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
    RFC 4341, 4.).

    Using a sysctl in parallel to this implementation would open the door to
    crashes, since much of the code relies on tests of the boolean minisock /
    sysctl variable. Thus, this patch replaces all tests of type

    if (dccp_msk(sk)->dccpms_send_ack_vector)
    /* ... */
    with
    if (dp->dccps_hc_rx_ackvec != NULL)
    /* ... */

    The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
    negotiation concluded that Ack Vectors are to be used on the half-connection.
    Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
    so that the test is a valid one.

    The activation handler for Ack Vectors is called as soon as the feature
    negotiation has concluded at the
    * server when the Ack marking the transition RESPOND => OPEN arrives;
    * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

    Adding the sequence number of the Response packet to the Ack Vector has been
    removed, since
    (a) connection establishment implies that the Response has been received;
    (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
    this entry will always be ignored;
    (c) it can not be used for anything useful - to detect loss for instance, only
    packets received after the loss can serve as pseudo-dupacks.

    There was a FIXME to change the error code when dccp_ackvec_add() fails.
    I removed this after finding out that:
    * the check whether ackno < ISN is already made earlier,
    * this Response is likely the 1st packet with an Ackno that the client gets,
    * so when dccp_ackvec_add() fails, the reason is likely not a packet error.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This first patch out of three replaces the hardcoded default settings with
    initialisation code for the dynamic feature negotiation.

    The patch also ensures that the client feature-negotiation queue is flushed
    only when entering the OPEN state.

    Since confirmed Change options are removed as soon as they are confirmed
    (in the DCCP-Response), this ensures that Confirm options are retransmitted.

    Note on retransmitting Confirm options:
    ---------------------------------------
    Implementation experience showed that it is necessary to retransmit Confirm
    options. Thanks to Leandro Melo de Sales who reported a bug in an earlier
    revision of the patch set, resulting from not retransmitting these options.

    As long as the client is in PARTOPEN, it needs to retransmit the Confirm
    options for the Change options received on the DCCP-Response from the server.

    Otherwise, if the packet containing the Confirm options gets dropped in the
    network, the connection aborts due to undefined feature negotiation state.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

26 Nov, 2008

1 commit


24 Nov, 2008

2 commits

  • This provides function to query the current TX/RX CCID dynamically,
    without reliance on the minisock value, using dynamic information
    available in the currently loaded CCID module.

    This query function is then used to
    (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
    (b) replace the current test for "which CCID is in use" in probe.c.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • With this patch, TX/RX CCIDs can now be changed on a per-connection
    basis, which overrides the defaults set by the global sysctl variables
    for TX/RX CCIDs.

    To make full use of this facility, the remaining patches of this patch
    set are needed, which track dependencies and activate negotiated
    feature values.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

20 Nov, 2008

1 commit

  • This patch prepares RCU migration of listening_hash table for
    TCP/DCCP protocols.

    listening_hash table being small (32 slots per protocol), we add
    a spinlock for each slot, instead of a single rwlock for whole table.

    This should reduce hold time of readers, and writers concurrency.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Nov, 2008

4 commits

  • This splits the setsockopt calls into two groups, depending on whether an
    integer argument (val) is required and whether routines being called do
    their own locking.

    Some options (such as setting the CCID) use u8 rather than int, so that for
    these the test with regard to integer-sizeof can not be used.

    The second switch-case statement now only has those statements which need
    locking and which make use of `val'.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Acked-by: Arnaldo Carvalho de Melo
    Reviewed-by: Eugene Teo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides feature negotiation for server minimum checksum coverage
    which so far has been missing.

    Since sender/receiver coverage values range only from 0...15, their
    type has also been reduced in size from u16 to u4.

    Feature-negotiation options are now generated for both sender and receiver
    coverage, i.e. when the peer has `forgotten' to enable partial coverage
    then feature negotiation will automatically enable (negotiate) the partial
    coverage value for this connection.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • The previous setsockopt interface, which passed socket options via struct
    dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
    ugly code since the old approach did not distinguish between NN and SP values.

    This patch removes the old setsockopt interface and replaces it with two new
    functions to register NN/SP values for feature negotiation.
    These are essentially wrappers around the internal __feat_register functions,
    with checking added to avoid

    * wrong usage (type);
    * changing values while the connection is in progress.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • RCU was added to UDP lookups, using a fast infrastructure :
    - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
    price of call_rcu() at freeing time.
    - hlist_nulls permits to use few memory barriers.

    This patch uses same infrastructure for TCP/DCCP established
    and timewait sockets.

    Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
    using short lived TCP connections. A followup patch, converting
    rwlocks to spinlocks will even speedup this case.

    __inet_lookup_established() is pretty fast now we dont have to
    dirty a contended cache line (read_lock/read_unlock)

    Only established and timewait hashtable are converted to RCU
    (bind table and listen table are still using traditional locking)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Nov, 2008

3 commits

  • This provides a missing link in the code chain, as several features implicitly
    depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
    feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

    For Send Ack Vector, the situation is as follows:
    * since CCID2 mandates the use of Ack Vectors, there is no point in allowing
    endpoints which use CCID2 to disable Ack Vector features such a connection;

    * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
    with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

    * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
    negotiable. However, this implies that the code negotiating the use of Ack
    Vectors also supports it (i.e. is able to supply and to either parse or
    ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
    Vector support), the use of Ack Vectors is here disabled, with a comment
    in the source code.

    An analogous consideration arises for the Send Loss Event Rate feature,
    since the CCID-3 implementation does not support the loss interval options
    of RFC 4342. To make such use explicit, corresponding feature-negotiation
    options are inserted which signal the use of the loss event rate option,
    as it is used by the CCID3 code.

    Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

    The patch implements this as a function which is called after the user has
    made all other registrations for changing default values of features.

    The table is variable-length, the reserved (and hence for feature-negotiation
    invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
    is used to mark the end of the table.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides a data structure to record which CCIDs are locally supported
    and three accessor functions:
    - a test function for internal use which is used to validate CCID requests
    made by the user;
    - a copy function so that the list can be used for feature-negotiation;
    - documented getsockopt() support so that the user can query capabilities.

    The data structure is a table which is filled in at compile-time with the
    list of available CCIDs (which in turn depends on the Kconfig choices).

    Using the copy function for cloning the list of supported CCIDs is useful for
    feature negotiation, since the negotiation is now with the full list of available
    CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
    will not fail if the peer requests to use CCID3 instead of CCID2.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Two registration routines, for SP and NN features, are provided by this patch,
    replacing a previous routine which was used for both feature types.

    These are internal-only routines and therefore start with `__feat_register'.

    It further exports the known limits of Sequence Window and Ack Ratio as symbolic
    constants.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

05 Nov, 2008

1 commit