11 Jan, 2009

2 commits

  • Thanks to Wei and Arnaldo for pointing out the correct
    new reference for CCID-3.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Removed the __exit annotation of tfrc_lib_exit(), in order to suppress the following section mismatch messages:

    WARNING: net/dccp/dccp.o(.text+0xd9): Section mismatch in reference from the function ccid_cleanup_builtins() to the function .exit.text:tfrc_lib_exit()
    The function ccid_cleanup_builtins() references a function in an exit section.
    Often the function tfrc_lib_exit() has valid usage outside the exit section
    and the fix is to remove the __exit annotation of tfrc_lib_exit.

    WARNING: net/dccp/dccp.o(.init.text+0x48): Section mismatch in reference from the function ccid_initialize_builtins() to the function .exit.text:tfrc_lib_exit()
    The function __init ccid_initialize_builtins() references
    a function __exit tfrc_lib_exit().
    This is often seen when error handling in the init function
    uses functionality in the exit path.
    The fix is often to remove the __exit annotation of
    tfrc_lib_exit() so it may be used outside an exit section.

    Signed-off-by: Leonardo Potenza
    Acked-by: Arnaldo Carvalho de Melo
    Acked-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Leonardo Potenza
     

05 Jan, 2009

3 commits

  • This patch integrates the TFRC library, which is a dependency of CCID-3 (and
    CCID-4), with the new use of CCIDs in the DCCP module.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch cleans up after integrating the CCID modules and, in addition,

    * moves the if/else cases from ccid_delete() into ccid_hc_{tx,rx}_delete();
    * removes the 'gfp' argument to ccid_new() - since it is always gfp_any().

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Based on Arnaldo's earlier patch, this patch integrates the standardised
    CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

    * enables a faster connection path by eliminating the need to always go
    through the CCID registration lock;

    * updates the implementation to use only a single array whose size equals
    the number of configured CCIDs instead of the maximum (256);

    * since the CCIDs are now fixed array elements, synchronization is no
    longer needed, simplifying use and implementation.

    CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
    CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

30 Dec, 2008

1 commit

  • When we converted the protocol atomic counters such as the orphan
    count and the total socket count deadlocks were introduced due to
    the mismatch in BH status of the spots that used the percpu counter
    operations.

    Based on the diagnosis and patch by Peter Zijlstra, this patch
    fixes these issues by disabling BH where we may be in process
    context.

    Reported-by: Jeff Kirsher
    Tested-by: Ingo Molnar
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Dec, 2008

1 commit

  • And thus when we try to use 'ss -danemi' on these sockets that have no
    ccid blocks (data collected using systemtap after I fixed the problem):

    dccp_diag_get_info sk=0xffff8801220a3100, dp->dccps_hc_rx_ccid=0x0000000000000000, dp->dccps_hc_tx_ccid=0x0000000000000000

    We get an OOPS:

    mica.ghostprotocols.net login: BUG: unable to handle kernel NULL pointer
    dereferenc0
    IP: [] dccp_diag_get_info+0x82/0xc0 [dccp_diag]
    PGD 12106f067 PUD 122488067 PMD 0
    Oops: 0000 [#1] PREEMPT

    Fix is trivial, and 'ss -d' is working again:

    [root@mica ~]# ss -danemi
    State Recv-Q Send-Q Local Address:Port Peer Address:Port
    LISTEN 0 0 *:5001 *:*
    ino:7288 sk:220a3100ffff8801
    mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
    [root@mica ~]#

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

08 Dec, 2008

7 commits

  • This removes the use of the sysctl and the minisock variable for the Send Ack
    Vector feature, as it now is handled fully dynamically via feature negotiation
    (i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
    RFC 4341, 4.).

    Using a sysctl in parallel to this implementation would open the door to
    crashes, since much of the code relies on tests of the boolean minisock /
    sysctl variable. Thus, this patch replaces all tests of type

    if (dccp_msk(sk)->dccpms_send_ack_vector)
    /* ... */
    with
    if (dp->dccps_hc_rx_ackvec != NULL)
    /* ... */

    The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
    negotiation concluded that Ack Vectors are to be used on the half-connection.
    Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
    so that the test is a valid one.

    The activation handler for Ack Vectors is called as soon as the feature
    negotiation has concluded at the
    * server when the Ack marking the transition RESPOND => OPEN arrives;
    * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

    Adding the sequence number of the Response packet to the Ack Vector has been
    removed, since
    (a) connection establishment implies that the Response has been received;
    (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
    this entry will always be ignored;
    (c) it can not be used for anything useful - to detect loss for instance, only
    packets received after the loss can serve as pseudo-dupacks.

    There was a FIXME to change the error code when dccp_ackvec_add() fails.
    I removed this after finding out that:
    * the check whether ackno < ISN is already made earlier,
    * this Response is likely the 1st packet with an Ackno that the client gets,
    * so when dccp_ackvec_add() fails, the reason is likely not a packet error.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Updating the NDP count feature is handled automatically now:
    * for CCID-2 it is disabled, since the code does not use NDP counts;
    * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

    Allowing the user to change NDP values leads to unpredictable and failing
    behaviour, since it is then possible to disable NDP counts even when they
    are needed (e.g. in CCID-3).

    This means that only those user settings are sensible that agree with the
    values for Send NDP Count implied by the choice of CCID. But those settings
    are already activated by the feature negotiation (CCID dependency tracking),
    hence this form of support is redundant.

    At startup the initialisation of the NDP count feature uses the default
    value of 0, which is done implicitly by the zeroing-out of the socket when
    it is allocated. If the choice of CCID or feature negotiation enables NDP
    count, this will then be updated via the NDP activation handler.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
    case, their value equals initially that of the sysctl, but at the end of
    feature negotiation may be something different.

    The old interface removed by this patch thus has been replaced by the newer
    interface to dynamically query the currently loaded CCIDs.

    Also removed are the constructors for the TX CCID and the RX CCID, since the
    switch "rx non-rx" is done by the handler in minisocks.c (and the handler
    is the only place in the code where CCIDs are loaded).

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • The code removed by this patch is no longer referenced or used, the added
    lines update documentation and copyrights.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This integrates feature-activation in the client:

    1. When dccp_parse_options() fails, the reset code is already set; request_sent\
    _state_process() currently overrides this with `Packet Error', which is not
    intended - changed to use the reset code supplied by dccp_parse_options().

    2. When feature negotiation fails, the socket should be marked as not usable,
    so that the application is notified that an error occurred. This is achieved
    by a new label 'unable_to_proceed': generating an error code of `Aborted',
    setting the socket state to CLOSED, returning with ECOMM in sk_err.

    3. Avoids parsing the Ack twice in Respond state by not doing option processing
    again in dccp_rcv_respond_partopen_state_process (as option processing has
    already been done on the request_sock in dccp_check_req).

    Since this addresses congestion-control initialisation, a corresponding
    FIXME has been removed.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch integrates the activation of features at the end of negotiation
    into the server-side code.

    Note regarding the removal of 'const':
    --------------------------------------
    The 'const' attribute has been removed from 'dreq' since dccp_activate_values()
    needs to operate on dreq's feature list. Part of the activation is to remove
    those options from the list that have already been confirmed, hence it is not
    purely read-only.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This first patch out of three replaces the hardcoded default settings with
    initialisation code for the dynamic feature negotiation.

    The patch also ensures that the client feature-negotiation queue is flushed
    only when entering the OPEN state.

    Since confirmed Change options are removed as soon as they are confirmed
    (in the DCCP-Response), this ensures that Confirm options are retransmitted.

    Note on retransmitting Confirm options:
    ---------------------------------------
    Implementation experience showed that it is necessary to retransmit Confirm
    options. Thanks to Leandro Melo de Sales who reported a bug in an earlier
    revision of the patch set, resulting from not retransmitting these options.

    As long as the client is in PARTOPEN, it needs to retransmit the Confirm
    options for the Change options received on the DCCP-Response from the server.

    Otherwise, if the packet containing the Confirm options gets dropped in the
    network, the connection aborts due to undefined feature negotiation state.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

06 Dec, 2008

1 commit


02 Dec, 2008

6 commits

  • This patch provides the post-processing of feature negotiation state, after
    the negotiation has completed.

    To this purpose, handlers are used and added to the dccp_feat_table. Each
    handler is passed a boolean flag whether the RX or TX side of the feature
    is meant.

    Several handlers are provided already, new handlers can easily be added.

    The initialisation is now fully dynamic, i.e. CCIDs are activated only
    after the feature negotiation. The integration of this dynamic activation
    is done in the subsequent patches.

    Thanks to Wei Yongjun for pointing out the necessity of skipping over empty
    Confirm options while copying the negotiated feature values.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Analogous to the previous patch, this adds code to interpret incoming Confirm
    feature-negotiation options. Both functions operate on the feature-negotiation
    list of either the request_sock (server) or the dccp_sock (client).

    Thanks to Wei Yongjun for pointing out that it is overly restrictive to check
    the entire list of confirmed SP values.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This adds/replaces code for processing incoming ChangeL/R options.
    The main difference is that:
    * mandatory FN options are now interpreted inside the function
    (there are too many individual cases to do this externally);
    * the function returns an appropriate Reset code or 0,
    which is then used to fill in the data for the Reset packet.

    Old code, which is no longer used or referenced, has been removed.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides two functions to
    * reconcile preference lists (with appropriate return codes) and
    * reorder the preference list if successful reconciliation changed the
    preferred value.

    The patch also removes the old code for processing SP/NN Change options, since
    new code to process these is mostly there already; related references have been
    commented out.

    The code for processing Change options follows in the next patch.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • The patch implements insertion of feature negotiation at the server (listening
    and request socket) and the client (connecting socket).

    In dccp_insert_options(), several statements have been grouped together now
    to achieve (it is hoped) better efficiency by reducing the number of tests
    each packet has to go through:
    - Ack Vectors are sent if the packet is neither a Data or a Request packet;
    - a previous issue is corrected - feature negotiation options are allowed
    on DataAck packets (5.8).

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch replaces the earlier insertion routine from options.c, so that
    code specific to feature negotiation can remain in feat.c. This is possible
    by calling a function already existing in options.c.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

26 Nov, 2008

3 commits

  • Instead of using one atomic_t per protocol, use a percpu_counter
    for "orphan_count", to reduce cache line contention on
    heavy duty network servers.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
    to flow_cache_lookup() and resolver callback.

    Take it from socket or netdevice. Stub DECnet to init_net.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • this warning:

    net/dccp/options.c: In function ‘dccp_parse_options’:
    net/dccp/options.c:67: warning: ‘value’ may be used uninitialized in this function

    is a bogus GCC warning. The compiler does not recognize the relation
    between "value" and "mandatory" variables: the code flow can ever reach
    the "out_invalid_option:" label if 'mandatory' is set to 1, and when
    'mandatory' is non-zero, we'll always have 'value' initialized.

    Help out the compiler by annotating the variable.

    Signed-off-by: Ingo Molnar
    Signed-off-by: David S. Miller

    Ingo Molnar
     

24 Nov, 2008

5 commits

  • The patch extends existing code:
    * Confirm options divide into the confirmed value plus an optional preference
    list for SP values. Previously only the preference list was echoed for SP
    values, now the confirmed value is added as per RFC 4340, 6.1;
    * length and sanity checks are added to avoid illegal memory (or NULL) access.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Support for Mandatory options is provided by this patch, which will
    be used by subsequent feature-negotiation patches.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This extends the scope of two available functions,
    encode|decode_value_var, to work up to 6 (8) bytes, to match maximum
    requirements in the RFC.

    These functions are going to be used both by general option processing
    and feature negotiation code, hence declarations have been put into
    feat.h.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides function to query the current TX/RX CCID dynamically,
    without reliance on the minisock value, using dynamic information
    available in the currently loaded CCID module.

    This query function is then used to
    (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
    (b) replace the current test for "which CCID is in use" in probe.c.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • With this patch, TX/RX CCIDs can now be changed on a per-connection
    basis, which overrides the defaults set by the global sysctl variables
    for TX/RX CCIDs.

    To make full use of this facility, the remaining patches of this patch
    set are needed, which track dependencies and activate negotiated
    feature values.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

20 Nov, 2008

3 commits


17 Nov, 2008

6 commits

  • This splits the setsockopt calls into two groups, depending on whether an
    integer argument (val) is required and whether routines being called do
    their own locking.

    Some options (such as setting the CCID) use u8 rather than int, so that for
    these the test with regard to integer-sizeof can not be used.

    The second switch-case statement now only has those statements which need
    locking and which make use of `val'.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Acked-by: Arnaldo Carvalho de Melo
    Reviewed-by: Eugene Teo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch deprecates the Ack Ratio sysctl, since
    * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
    * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
    * even if it would work in CCID-2, there is no point for a user to change it:
    - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
    - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
    (since waiting for Acks which will never arrive in this window),
    - cwnd is not a user-configurable value.

    The only reasonable place for Ack Ratio is to print it for debugging. It is
    planned to do this later on, as part of e.g. dccp_probe.

    With this patch Ack Ratio is now under full control of feature negotiation:
    * Ack Ratio is resolved as a dependency of the selected CCID;
    * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
    the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
    Ratio 2 for both endpoints";
    * what happens then is part of another patch set, since it concerns the
    dynamic update of Ack Ratio while the connection is in full flight.

    Thanks to Tomasz Grobelny for discussion leading up to this patch.

    Signed-off-by: Gerrit Renker
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides feature negotiation for server minimum checksum coverage
    which so far has been missing.

    Since sender/receiver coverage values range only from 0...15, their
    type has also been reduced in size from u16 to u4.

    Feature-negotiation options are now generated for both sender and receiver
    coverage, i.e. when the peer has `forgotten' to enable partial coverage
    then feature negotiation will automatically enable (negotiate) the partial
    coverage value for this connection.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • The previous setsockopt interface, which passed socket options via struct
    dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
    ugly code since the old approach did not distinguish between NN and SP values.

    This patch removes the old setsockopt interface and replaces it with two new
    functions to register NN/SP values for feature negotiation.
    These are essentially wrappers around the internal __feat_register functions,
    with checking added to avoid

    * wrong usage (type);
    * changing values while the connection is in progress.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This adds a hook to resolve features whose value depends on the choice of
    CCID. It is done at the server since it can only be done after the CCID
    values have been negotiated; i.e. the client will add its CCID preference
    list on the Change options sent in the Request, which will be reconciled
    with the local preference list of the server.

    The concept is documented on
    http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
    implementation_notes.html#ccid_dependencies

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • RCU was added to UDP lookups, using a fast infrastructure :
    - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
    price of call_rcu() at freeing time.
    - hlist_nulls permits to use few memory barriers.

    This patch uses same infrastructure for TCP/DCCP established
    and timewait sockets.

    Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
    using short lived TCP connections. A followup patch, converting
    rwlocks to spinlocks will even speedup this case.

    __inet_lookup_established() is pretty fast now we dont have to
    dirty a contended cache line (read_lock/read_unlock)

    Only established and timewait hashtable are converted to RCU
    (bind table and listen table are still using traditional locking)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Nov, 2008

2 commits

  • This provides a missing link in the code chain, as several features implicitly
    depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
    feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

    For Send Ack Vector, the situation is as follows:
    * since CCID2 mandates the use of Ack Vectors, there is no point in allowing
    endpoints which use CCID2 to disable Ack Vector features such a connection;

    * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
    with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

    * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
    negotiable. However, this implies that the code negotiating the use of Ack
    Vectors also supports it (i.e. is able to supply and to either parse or
    ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
    Vector support), the use of Ack Vectors is here disabled, with a comment
    in the source code.

    An analogous consideration arises for the Send Loss Event Rate feature,
    since the CCID-3 implementation does not support the loss interval options
    of RFC 4342. To make such use explicit, corresponding feature-negotiation
    options are inserted which signal the use of the loss event rate option,
    as it is used by the CCID3 code.

    Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

    The patch implements this as a function which is called after the user has
    made all other registrations for changing default values of features.

    The table is variable-length, the reserved (and hence for feature-negotiation
    invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
    is used to mark the end of the table.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides a data structure to record which CCIDs are locally supported
    and three accessor functions:
    - a test function for internal use which is used to validate CCID requests
    made by the user;
    - a copy function so that the list can be used for feature-negotiation;
    - documented getsockopt() support so that the user can query capabilities.

    The data structure is a table which is filled in at compile-time with the
    list of available CCIDs (which in turn depends on the Kconfig choices).

    Using the copy function for cloning the list of supported CCIDs is useful for
    feature negotiation, since the negotiation is now with the full list of available
    CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
    will not fail if the peer requests to use CCID3 instead of CCID2.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker