28 Apr, 2016

2 commits


21 Mar, 2015

1 commit

  • One of the major issue for TCP is the SYNACK rtx handling,
    done by inet_csk_reqsk_queue_prune(), fired by the keepalive
    timer of a TCP_LISTEN socket.

    This function runs for awful long times, with socket lock held,
    meaning that other cpus needing this lock have to spin for hundred of ms.

    SYNACK are sent in huge bursts, likely to cause severe drops anyway.

    This model was OK 15 years ago when memory was very tight.

    We now can afford to have a timer per request sock.

    Timer invocations no longer need to lock the listener,
    and can be run from all cpus in parallel.

    With following patch increasing somaxconn width to 32 bits,
    I tested a listener with more than 4 million active request sockets,
    and a steady SYNFLOOD of ~200,000 SYN per second.
    Host was sending ~830,000 SYNACK per second.

    This is ~100 times more what we could achieve before this patch.

    Later, we will get rid of the listener hash and use ehash instead.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2014

1 commit

  • 'dccp_timestamp_seed' is initialized once by ktime_get_real() in
    dccp_timestamping_init(). It is always less than ktime_get_real()
    in dccp_timestamp().

    Then, ktime_us_delta() in dccp_timestamp() will always return positive
    number. So can use manual type cast to let compiler and do_div() know
    about it to avoid warning.

    The related warning (with allmodconfig under unicore32):

    CC [M] net/dccp/timer.o
    net/dccp/timer.c: In function ‘dccp_timestamp’:
    net/dccp/timer.c:285: warning: comparison of distinct pointer types lacks a cast

    Signed-off-by: Chen Gang
    Signed-off-by: David S. Miller

    Chen Gang
     

01 Nov, 2011

1 commit


29 Oct, 2010

2 commits

  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This extends the packet dequeuing interface of dccp_write_xmit() to allow
    1. CCIDs to take care of timing when the next packet may be sent;
    2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

    The main purpose is to take CCID-2 out of its polling mode (when it is network-
    limited, it tries every millisecond to send, without interruption).

    The mode of operation for (2) is as follows:
    * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
    * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
    * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
    * dccp_write_xmit() returns without further action;
    * after some time the wait-condition for CCID becomes true,
    * that CCID schedules the tasklet,
    * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
    * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
    * packet is sent, and possibly more (since dccp_write_xmit() loops).

    Code reuse: the taskled function calls dccp_write_xmit(), the timer function
    reduces to a wrapper around the same code.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

13 Apr, 2010

1 commit

  • With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
    work.

    sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

    This rwlock is readlocked for a very small amount of time, and dst
    entries are already freed after RCU grace period. This calls for RCU
    again :)

    This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

    __sk_dst_get() is supposed to be called with rcu_read_lock() or if
    socket locked by user, so use appropriate rcu_dereference_check()
    condition (rcu_read_lock_held() || sock_owned_by_user(sk))

    This patch avoids two atomic ops per tx packet on UDP connected sockets,
    for example, and permits sk_dst_lock to be much less dirtied.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Oct, 2009

1 commit

  • dst_negative_advice() should check for changed dst and reset
    sk_tx_queue_mapping accordingly. Pass sock to the callers of
    dst_negative_advice.

    (sk_reset_txq is defined just for use by dst_negative_advice. The
    only way I could find to get around this is to move dst_negative_()
    from dst.h to dst.c, include sock.h in dst.c, etc)

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar