01 Dec, 2015

1 commit

  • The memory barrier in the helper wq_has_sleeper is needed by just
    about every user of waitqueue_active. This patch generalises it
    by making it take a wait_queue_head_t directly. The existing
    helper is renamed to skwq_has_sleeper.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

26 Sep, 2015

1 commit

  • Like tcp_make_synack() the only time we might change the socket is
    when calling sock_wmalloc(), which is using atomic operation to
    update sk->sk_wmem_alloc

    Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Apr, 2014

1 commit

  • ip_queue_xmit() assumes the skb it has to transmit is attached to an
    inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership")
    changed l2tp to not change skb ownership and thus broke this assumption.

    One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
    so that we do not assume skb->sk points to the socket used by l2tp
    tunnel.

    Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
    Reported-by: Zhan Jianyu
    Tested-by: Zhan Jianyu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Oct, 2013

1 commit

  • In commit 634fb979e8f ("inet: includes a sock_common in request_sock")
    I forgot that the two ports in sock_common do not have same byte order :

    skc_dport is __be16 (network order), but skc_num is __u16 (host order)

    So sparse complains because ir_loc_port (mapped into skc_num) is
    considered as __u16 while it should be __be16

    Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num),
    and perform appropriate htons/ntohs conversions.

    Signed-off-by: Eric Dumazet
    Reported-by: Wu Fengguang
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Oct, 2013

1 commit

  • TCP listener refactoring, part 5 :

    We want to be able to insert request sockets (SYN_RECV) into main
    ehash table instead of the per listener hash table to allow RCU
    lookups and remove listener lock contention.

    This patch includes the needed struct sock_common in front
    of struct request_sock

    This means there is no more inet6_request_sock IPv6 specific
    structure.

    Following inet_request_sock fields were renamed as they became
    macros to reference fields from struct sock_common.
    Prefix ir_ was chosen to avoid name collisions.

    loc_port -> ir_loc_port
    loc_addr -> ir_loc_addr
    rmt_addr -> ir_rmt_addr
    rmt_port -> ir_rmt_port
    iif -> ir_iif

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jul, 2012

1 commit


04 Mar, 2012

1 commit

  • This fixes a bug in the sequence number validation during the initial handshake.

    The code did not treat the initial sequence numbers ISS and ISR as read-only and
    did not keep state for GSR and GSS as required by the specification. This causes
    problems with retransmissions during the initial handshake, causing the
    budding connection to be reset.

    This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required.

    Signed-off-by: Samuel Jero
    Signed-off-by: Gerrit Renker

    Samuel Jero
     

05 Jul, 2011

1 commit


09 May, 2011

1 commit

  • This allows us to acquire the exact route keying information from the
    protocol, however that might be managed.

    It handles all of the possibilities, from the simplest case of storing
    the key in inet->cork.fl to the more complex setup SCTP has where
    individual transports determine the flow.

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Mar, 2011

1 commit


07 Dec, 2010

1 commit

  • This patch adds a generic infrastructure for policy-based dequeueing of
    TX packets and provides two policies:
    * a simple FIFO policy (which is the default) and
    * a priority based policy (set via socket options).
    Both policies honour the tx_qlen sysctl for the maximum size of the write
    queue (can be overridden via socket options).

    The priority policy uses skb->priority internally to assign an u32 priority
    identifier, using the same ranking as SO_PRIORITY. The skb->priority field
    is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
    data using cmsg(3), the patch also provides the requisite parsing routines.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     

15 Nov, 2010

1 commit

  • The problem with Ack Vectors is that
    i) their length is variable and can in principle grow quite large,
    ii) it is hard to predict exactly how large they will be.

    Due to the second point it seems not a good idea to reduce the MPS; in
    particular when on average there is enough room for the Ack Vector and an
    increase in length is momentarily due to some burst loss, after which the
    Ack Vector returns to its normal/average length.

    The solution taken by this patch is to subtract a minimum-expected Ack Vector
    length from the MPS, and to defer any larger Ack Vectors onto a separate
    Sync - but only if indeed there is no space left on the skb.

    This patch provides the infrastructure to schedule Sync-packets for transporting
    (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
    it does not need to wait for new application data.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

29 Oct, 2010

2 commits

  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This extends the packet dequeuing interface of dccp_write_xmit() to allow
    1. CCIDs to take care of timing when the next packet may be sent;
    2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

    The main purpose is to take CCID-2 out of its polling mode (when it is network-
    limited, it tries every millisecond to send, without interruption).

    The mode of operation for (2) is as follows:
    * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
    * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
    * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
    * dccp_write_xmit() returns without further action;
    * after some time the wait-condition for CCID becomes true,
    * that CCID schedules the tasklet,
    * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
    * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
    * packet is sent, and possibly more (since dccp_write_xmit() loops).

    Code reuse: the taskled function calls dccp_write_xmit(), the timer function
    reduces to a wrapper around the same code.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

12 Oct, 2010

2 commits


02 May, 2010

1 commit

  • sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
    need two atomic operations (and associated dirtying) per incoming
    packet.

    RCU conversion is pretty much needed :

    1) Add a new structure, called "struct socket_wq" to hold all fields
    that will need rcu_read_lock() protection (currently: a
    wait_queue_head_t and a struct fasync_struct pointer).

    [Future patch will add a list anchor for wakeup coalescing]

    2) Attach one of such structure to each "struct socket" created in
    sock_alloc_inode().

    3) Respect RCU grace period when freeing a "struct socket_wq"

    4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
    socket_wq"

    5) Change sk_sleep() function to use new sk->sk_wq instead of
    sk->sk_sleep

    6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
    a rcu_read_lock() section.

    7) Change all sk_has_sleeper() callers to :
    - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
    - Use wq_has_sleeper() to eventually wakeup tasks.
    - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

    8) sock_wake_async() is modified to use rcu protection as well.

    9) Exceptions :
    macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
    instead of dynamically allocated ones. They dont need rcu freeing.

    Some cleanups or followups are probably needed, (possible
    sk_callback_lock conversion to a spinlock for example...).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Apr, 2010

1 commit

  • As Herbert Xu said: we should be able to simply replace ipfragok
    with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
    has droped ipfragok and set local_df value properly.

    The patch kills the ipfragok parameter of .queue_xmit().

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

12 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Jul, 2009

1 commit

  • Adding memory barrier after the poll_wait function, paired with
    receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
    to wrap the memory barrier.

    Without the memory barrier, following race can happen.
    The race fires, when following code paths meet, and the tp->rcv_nxt
    and __add_wait_queue updates stay in CPU caches.

    CPU1 CPU2

    sys_select receive packet
    ... ...
    __add_wait_queue update tp->rcv_nxt
    ... ...
    tp->rcv_nxt check sock_def_readable
    ... {
    schedule ...
    if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
    wake_up_interruptible(sk->sk_sleep)
    ...
    }

    If there was no cache the code would work ok, since the wait_queue and
    rcv_nxt are opposit to each other.

    Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
    passed the tp->rcv_nxt check and sleeps, or will get the new value for
    tp->rcv_nxt and will return with new data mask.
    In both cases the process (CPU1) is being added to the wait queue, so the
    waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

    The bad case is when the __add_wait_queue changes done by CPU1 stay in its
    cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
    endup calling schedule and sleep forever if there are no more data on the
    socket.

    Calls to poll_wait in following modules were ommited:
    net/bluetooth/af_bluetooth.c
    net/irda/af_irda.c
    net/irda/irnet/irnet_ppp.c
    net/mac80211/rc80211_pid_debugfs.c
    net/phonet/socket.c
    net/rds/af_rds.c
    net/rfkill/core.c
    net/sunrpc/cache.c
    net/sunrpc/rpc_pipe.c
    net/tipc/socket.c

    Signed-off-by: Jiri Olsa
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jiri Olsa
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Mar, 2009

2 commits

  • This fixes a problem caused by the overlap of the connection-setup and
    established-state phases of DCCP connections.

    During connection setup, the client retransmits Confirm Feature-Negotiation
    options until a response from the server signals that it can move from the
    half-established PARTOPEN into the OPEN state, whereupon the connection is
    fully established on both ends (RFC 4340, 8.1.5).

    However, since the client may already send data while it is in the PARTOPEN
    state, consequences arise for the Maximum Packet Size: the problem is that the
    initial option overhead is much higher than for the subsequent established
    phase, as it involves potentially many variable-length list-type options
    (server-priority options, RFC 4340, 6.4).

    Applying the standard MPS is insufficient here: especially with larger
    payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

    On the other hand, reducing the MPS available for the established phase by
    the added initial overhead is highly wasteful and inefficient.

    The solution chosen therefore is a two-phase strategy:

    If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
    to carry the options, and the feature-negotiation list is then flushed.

    This means that the server gets two Acks for one Response. If both Acks get
    lost, it is probably better to restart the connection anyway and devising yet
    another special-case does not seem worth the extra complexity.

    The result is a higher utilisation of the available packet space for the data
    transmission phase (established state) of a connection.

    The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
    seen values were around 90 bytes for initial feature-negotiation options.

    It uses sizeof(u32) to mean "aligned units of 4 bytes".
    For consistency, another use of 4-byte alignment is adapted.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch resolves a long-standing FIXME to dynamically update the Maximum
    Packet Size depending on actual options usage.

    It uses the flags set by the feature-negotiation infrastructure to compute
    the required header option size.

    Most options are fixed-size, a notable exception are Ack Vectors (required
    currently only by CCID-2). These can have any length between 3 and 1020
    bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
    Vector cells) have been found to be sufficient for loss-free situations.

    There are currently no CCID-specific header options which may appear on data
    packets, thus it is not necessary to define a corresponding CCID field as
    suggested in the old comment.

    Further changes:
    ----------------
    Adjusted the type of 'cur_mps' to match the unsigned return type of the
    function.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

06 Dec, 2008

1 commit


17 Nov, 2008

1 commit

  • This adds a hook to resolve features whose value depends on the choice of
    CCID. It is done at the server since it can only be done after the CCID
    values have been negotiated; i.e. the client will add its CCID preference
    list on the Change options sent in the Request, which will be reconciled
    with the local preference list of the server.

    The concept is documented on
    http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
    implementation_notes.html#ccid_dependencies

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

12 Nov, 2008

1 commit

  • This provides a missing link in the code chain, as several features implicitly
    depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
    feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

    For Send Ack Vector, the situation is as follows:
    * since CCID2 mandates the use of Ack Vectors, there is no point in allowing
    endpoints which use CCID2 to disable Ack Vector features such a connection;

    * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
    with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

    * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
    negotiable. However, this implies that the code negotiating the use of Ack
    Vectors also supports it (i.e. is able to supply and to either parse or
    ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
    Vector support), the use of Ack Vectors is here disabled, with a comment
    in the source code.

    An analogous consideration arises for the Send Loss Event Rate feature,
    since the CCID-3 implementation does not support the loss interval options
    of RFC 4342. To make such use explicit, corresponding feature-negotiation
    options are inserted which signal the use of the loss event rate option,
    as it is used by the CCID3 code.

    Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

    The patch implements this as a function which is called after the user has
    made all other registrations for changing default values of features.

    The table is variable-length, the reserved (and hence for feature-negotiation
    invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
    is used to mark the end of the table.

    Signed-off-by: Gerrit Renker
    Acked-by: Ian McDonald
    Signed-off-by: David S. Miller

    Gerrit Renker
     

20 Oct, 2008

1 commit

  • Commit a3116ac5c216fc3c145906a46df9ce542ff7dcf2 from 1st October ("tcp: Port
    redirection support for TCP") broke DCCP skb lookup by changing inet_csk_clone,
    which is used by DCCP to generate the child socket after the handshake.

    This patch updates DCCP to use 'loc_port' instead of 'sport', which fixes the
    problem, and thus inheriting port redirection support via the new interface.

    Signed-off-by: Gerrit Renker
    Signed-off-by: KOVACS Krisztian
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     

26 Jul, 2008

2 commits

  • The AWL lower Ack validity window advances in proportion to GSS, the greatest
    sequence number sent. Updating AWL other than at connection setup (in the
    DCCP-Request sent by dccp_v{4,6}_connect()) was missing in the DCCP code.

    This bug lead to syslog messages such as

    "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...]
    P.ackno exists or LAWL(82947089)
    Acked-by: Ian McDonald

    Gerrit Renker
     
  • This patch allows the sender to distinguish original and retransmitted packets,
    which is in particular needed for the retransmission of DCCP-Requests:
    * the first Request uses ISS (generated in net/dccp/ip*.c), and sets GSS = ISS;
    * all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted
    Request has sequence number ISS + n (mod 48).

    To add generic support, the patch reorganises existing code so that:
    * icsk_retransmits == 0 for the original packet and
    * icsk_retransmits = n > 0 for the n-th retransmitted packet
    at the time dccp_transmit_skb() is called, via dccp_retransmit_skb().

    Thanks to Wei Yongjun for pointing this problem out.

    Further changes:
    ----------------
    * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head
    is used for all retransmissions (the exception is client-Acks in PARTOPEN
    state, but these do not use sk_send_head);
    * since sk_send_head always contains the original skb (via dccp_entail()),
    skb_cloned() never evaluated to true and thus pskb_copy() was never used.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

11 Jun, 2008

1 commit

  • This patch fixes the following sparse warnings:
    * nested min(max()) expression:
    net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
    net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one

    * Declaration of function prototypes in .c instead of .h file, resulting in
    "should it be static?" warnings.

    * Declared "struct dccpw" static (local to dccp_probe).

    * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
    ("Receivers SHOULD implement delayed acknowledgement timers ...").

    * Used a different local variable name to avoid
    net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
    net/dccp/ackvec.c:238:33: originally declared here

    * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

14 Apr, 2008

1 commit


13 Apr, 2008

1 commit

  • dev_queue_xmit() and the other IP output functions expect to get a skb
    with clear or properly initialized skb->cb. Unlike TCP and UDP, the
    dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning,
    so the DCCP-specific data is interpreted by the IP output functions.
    This can cause false negatives for the conditional POST_ROUTING hook
    invocation, making the packet bypass the hook.

    Add a inet_skb_parm/inet6_skb_parm union to the beginning of
    dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make
    sure it fits in the cb.

    [ Combined with patch from Gerrit Renker to remove two now unnecessary
    memsets of IPCB(skb)->opt ]

    Signed-off-by: Patrick McHardy
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Patrick McHardy
     

04 Apr, 2008

1 commit


29 Jan, 2008

4 commits

  • This introduces a CCMPS field for setting a CCID-specific upper bound on the application payload
    size, as is defined in RFC 4340, section 14.

    Only the TX CCID is considered in setting this limit, since the RX CCID generates comparatively
    small (DCCP-Ack) feedback packets. The CCMPS field includes network and transport layer header
    lengths. The only current CCMPS customer is CCID4 (via RFC 4828).

    A wrapper is used to allow querying the CCMPS even at times where the CCID modules may not have
    been fully negotiated yet.

    In dccp_sync_mss() the variable `mss_now' has been renamed into `cur_mps', to reflect that we are
    dealing with an MPS, but not an MSS.
    Since the DCCP code closely follows the TCP code, the identifiers `dccp_sync_mss' and
    `dccps_mss_cache' have been kept, as they have direct TCP counterparts.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This provides a separate routine to insert options during the initial handshake.
    The main purpose is to conduct feature negotiation, for the moment the only user
    is the timestamp echo needed for the (CCID3) handshake RTT sample.

    Padding of options has been put into a small separate routine, to be shared among
    the two functions. This could also be used as a generic routine to finish inserting
    options.

    Also removed an `XXX' comment since its content was obvious.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This adds a socket option and signalling support for the case where the server
    holds timewait state on closing the connection, as described in RFC 4340, 8.3.

    Since holding timewait state at the server is the non-usual case, it is enabled
    via a socket option. Documentation for this socket option has been added.

    The setsockopt statement has been made resilient against different possible cases
    of expressing boolean `true' values using a suggestion by Ian McDonald.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • When performing active close, RFC 4340, 8.3. requires to retransmit the
    Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

    This patch shifts the existing code for active-close retransmit timer
    into output.c, so that the retransmit timer is started when the first
    Close/CloseReq is sent. Previously, the timer was started when, after
    releasing the socket in dccp_close(), the actively-closing side had not yet
    reached the CLOSED/TIMEWAIT state.

    The patch further reduces the initial timeout from 3 seconds to the required
    2 RTTs, where - in absence of a known RTT - the fallback value specified in
    RFC 4340, 3.4 is used.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Gerrit Renker