26 May, 2018

1 commit


28 Nov, 2017

1 commit


28 Apr, 2016

2 commits


23 Oct, 2015

1 commit

  • Multiple cpus can process duplicates of incoming ACK messages
    matching a SYN_RECV request socket. This is a rare event under
    normal operations, but definitely can happen.

    Only one must win the race, otherwise corruption would occur.

    To fix this without adding new atomic ops, we use logic in
    inet_ehash_nolisten() to detect the request was present in the same
    ehash bucket where we try to insert the new child.

    If request socket was not found, we have to undo the child creation.

    This actually removes a spin_lock()/spin_unlock() pair in
    reqsk_queue_unlink() for the fast path.

    Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2015

1 commit

  • This patch makes dccp_bad_service_code return bool due to these
    particular functions only using either one or zero as their return
    value.

    dccp_list_has_service is also been made return bool in this patchset.

    No functional change.

    Signed-off-by: Yaowei Bai
    Signed-off-by: David S. Miller

    Yaowei Bai
     

30 Sep, 2015

3 commits


26 Sep, 2015

1 commit

  • Like tcp_make_synack() the only time we might change the socket is
    when calling sock_wmalloc(), which is using atomic operation to
    update sk->sk_wmem_alloc

    Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Mar, 2015

1 commit


21 Mar, 2015

1 commit

  • When request sock are put in ehash table, the whole notion
    of having a previous request to update dl_next is pointless.

    Also, following patch will get rid of big purge timer,
    so we want to delete a request sock without holding listener lock.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

09 Nov, 2014

1 commit

  • Remove the dependency on the "warning" sysctl (net_msg_warn)
    which is only used by the LIMIT_NETDEBUG macro.

    Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
    common net_warn_ratelimited mechanism.

    This still ratelimits based on the net_ratelimit()
    function, but removes the check for the sysctl.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

05 Jan, 2014

1 commit


20 Oct, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

11 Jul, 2012

1 commit


16 Apr, 2012

1 commit


20 Dec, 2011

1 commit

  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     

12 Dec, 2011

1 commit


01 Aug, 2011

1 commit

  • In contrast to static feature negotiation at the begin of a connection, this
    patch introduces support for exchange of dynamically changing options.

    Such an update/exchange is necessary in at least two cases:
    * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection;
    * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as
    the connection progresses".

    Both are non-negotiable (NN) features, which means that no new capabilities
    are negotiated, but rather that changes in known parameters are brought
    up-to-date at either end.

    Thse characteristics are reflected by the implementation:
    * only NN options can be exchanged after connection setup;
    * an ack is scheduled directly after activation to speed up the update;
    * CCIDs may request changes to an NN feature even if a negotiation for that
    feature is already underway: this is required by CCID-2, where changes in
    cwnd necessitate Ack Ratio changes, such that the previous Ack Ratio (which
    is still being negotiated) would cause irrecoverable RTO timeouts (thanks
    to work by Samuel Jero).

    Signed-off-by: Gerrit Renker
    Signed-off-by: Samuel Jero
    Acked-by: Ian McDonald

    Gerrit Renker
     

07 Jan, 2011

1 commit

  • Currently dccp_check_seqno allows any valid packet to update the Greatest
    Sequence Number Received, even if that packet's sequence number is less than
    the current GSR. This patch adds a check to make sure that the new packet's
    sequence number is greater than GSR.

    Signed-off-by: Samuel Jero
    Signed-off-by: Gerrit Renker

    Samuel Jero
     

10 Dec, 2010

1 commit

  • Remove macros which have been unused since the initial implementation
    (commit 7c657876b63cb1d8a2ec06f8fc6c37bb8412e66c, [DCCP]: Initial
    implementation from Tue Aug 9 20:14:34 2005 -0700).

    Signed-off-by: Shan Wei
    Acked-by: Gerrit Renker

    Shan Wei
     

07 Dec, 2010

2 commits

  • Ensure that cmsg->cmsg_type value is valid for qpolicy
    that is currently in use.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     
  • This patch adds a generic infrastructure for policy-based dequeueing of
    TX packets and provides two policies:
    * a simple FIFO policy (which is the default) and
    * a priority based policy (set via socket options).
    Both policies honour the tx_qlen sysctl for the maximum size of the write
    queue (can be overridden via socket options).

    The priority policy uses skb->priority internally to assign an u32 priority
    identifier, using the same ranking as SO_PRIORITY. The skb->priority field
    is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
    data using cmsg(3), the patch also provides the requisite parsing routines.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     

11 Nov, 2010

1 commit

  • This completes the implementation of a circular buffer for Ack Vectors, by
    extending the current (linear array-based) implementation. The changes are:

    (a) An `overflow' flag to deal with the case of overflow. As before, dynamic
    growth of the buffer will not be supported; but code will be added to deal
    robustly with overflowing Ack Vector buffers.

    (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
    in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
    which can bring the entire run length calculation completely out of synch.
    (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
    ack_vectors/tracking_tail_ackno/ .)
    (c) The buffer length is now computed dynamically (i.e. current fill level),
    as the span between head to tail.

    As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer
    necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

29 Oct, 2010

1 commit

  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

12 Oct, 2010

3 commits

  • This schedules an Ack when receiving a timestamp, exploiting the
    existing inet_csk_schedule_ack() function, saving one case in the
    `dccp_ack_pending()' function.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This patch generalises the task of determining data loss from RFC 4340, 7.7.1.

    Let S_A, S_B be sequence numbers such that S_B is "after" S_A, and let
    N_B be the NDP count of packet S_B. Then, using modulo-2^48 arithmetic,
    D = S_B - S_A - 1 is an upper bound of the number of lost data packets,
    D - N_B is an approximation of the number of lost data packets
    (there are cases where this is not exact).

    The patch implements this as
    dccp_loss_count(S_A, S_B, N_B) := max(S_B - S_A - 1 - N_B, 0)

    Signed-off-by: Ivo Calado
    Signed-off-by: Erivaldo Xavier
    Signed-off-by: Leandro Sales
    Signed-off-by: Gerrit Renker

    Ivo Calado
     
  • This fixes a problem and a potential loophole with regard to seqno/ackno
    validity: currently the initial adjustments to AWL/SWL are only performed
    once at the begin of the connection, during the handshake.

    Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
    it is however necessary to perform these adjustments at least for the first
    W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

    This requirement is complicated by the fact that W/W' can change at any time
    during the lifetime of a connection.

    Therefore it is better to perform that safety check each time SWL/AWL are
    updated, as implemented by the patch.

    A second problem solved by this patch is that the remote/local Sequence Window
    feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
    feature negotiation has completed.

    During the initial handshake we have more stringent sequence number protection;
    the changes added by this patch effect that {A,S}W{L,H} are within the correct
    bounds at the instant that feature negotiation completes (since the SeqWin
    feature activation handlers call dccp_update_gsr/gss()).

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

07 Oct, 2010

1 commit


26 Jun, 2010

1 commit


12 Apr, 2010

1 commit


22 Mar, 2010

1 commit

  • There is no point to align or pad mibs to cache lines, they are per cpu
    allocated with a 8 bytes alignment anyway.
    This wastes space for no gain. This patch removes __SNMP_MIB_ALIGN__

    Since SNMP mibs contain "unsigned long" fields only, we can relax the
    allocation alignment from "unsigned long long" to "unsigned long"

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Mar, 2009

1 commit

  • This fixes a problem caused by the overlap of the connection-setup and
    established-state phases of DCCP connections.

    During connection setup, the client retransmits Confirm Feature-Negotiation
    options until a response from the server signals that it can move from the
    half-established PARTOPEN into the OPEN state, whereupon the connection is
    fully established on both ends (RFC 4340, 8.1.5).

    However, since the client may already send data while it is in the PARTOPEN
    state, consequences arise for the Maximum Packet Size: the problem is that the
    initial option overhead is much higher than for the subsequent established
    phase, as it involves potentially many variable-length list-type options
    (server-priority options, RFC 4340, 6.4).

    Applying the standard MPS is insufficient here: especially with larger
    payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

    On the other hand, reducing the MPS available for the established phase by
    the added initial overhead is highly wasteful and inefficient.

    The solution chosen therefore is a two-phase strategy:

    If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
    to carry the options, and the feature-negotiation list is then flushed.

    This means that the server gets two Acks for one Response. If both Acks get
    lost, it is probably better to restart the connection anyway and devising yet
    another special-case does not seem worth the extra complexity.

    The result is a higher utilisation of the available packet space for the data
    transmission phase (established state) of a connection.

    The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
    seen values were around 90 bytes for initial feature-negotiation options.

    It uses sizeof(u32) to mean "aligned units of 4 bytes".
    For consistency, another use of 4-byte alignment is adapted.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

22 Jan, 2009

3 commits

  • Since all feature-negotiation processing now takes place in feat.c,
    functions for producing verbose debugging output are concentrated
    there.

    New functions to print out values, entry records, and options are
    provided, and also a macro is defined to not always have the function
    name in the output line.

    Thanks a lot to Wei Yongjun and Giuseppe Galeota for help and
    discussion with an earlier revision of this patch.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch takes care of initialising and type-checking sysctls
    related to feature negotiation. Type checking is important since some
    of the sysctls now directly impact the feature-negotiation process.

    The sysctls are initialised with the known default values for each
    feature. For the type-checking the value constraints from RFC 4340
    are used:

    * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
    tested and confirmed that it works up to 4294967295 - for Gbps speed;
    * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
    * CCIDs are between 0 .. 255;
    * request_retries, retries1, retries2 also between 0..255 for good measure;
    * tx_qlen is checked to be non-negative;
    * sync_ratelimit remains as before.

    Notes:
    ------
    1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
    2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in
    other places, sometimes with exactly the same kind of definitions (e.g.
    "static int zero;"). It may be a good idea (kernel janitors?) to consolidate
    type checking. For the sake of keeping the changeset small and in order not
    to affect other subsystems, I have not strived to generalise here.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This adds full support for local/remote Sequence Window feature, from which the
    * sequence-number-validity (W) and
    * acknowledgment-number-validity (W') windows
    derive as specified in RFC 4340, 7.5.3.

    Specifically, the following is contained in this patch:
    * integrated new socket fields into dccp_sk;
    * updated the update_gsr/gss routines with regard to these fields;
    * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
    * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
    * the `struct dccp_minisock' became empty and is now removed.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

05 Jan, 2009

1 commit

  • Based on Arnaldo's earlier patch, this patch integrates the standardised
    CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

    * enables a faster connection path by eliminating the need to always go
    through the CCID registration lock;

    * updates the implementation to use only a single array whose size equals
    the number of configured CCIDs instead of the maximum (256);

    * since the CCIDs are now fixed array elements, synchronization is no
    longer needed, simplifying use and implementation.

    CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
    CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker