25 Sep, 2006

3 commits

  • This adds DCCP probing shamelessly ripped off from TCP probes by Stephen
    Hemminger.

    I've put in here support for further CCID3 variables as well.
    Andrea/Arnaldo might look to extend for CCID2.

    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian McDonald
     
  • With constants for CCID numbers this now uses them in some places.

    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian McDonald
     
  • This has been discussed on dccp@vger and removes the necessity for applications
    to supply service codes in each and every case.

    If an application does not want to provide a service code, that's fine, it will
    be given 0. Otherwise, service codes can be set via socket options as before.

    This patch has been tested using various client/server configurations
    (including listening on multiple service codes).

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     

23 Sep, 2006

18 commits

  • Introduce methods which manipulate interesting congestion control
    state such as pipe and rtt estimate. This is useful for people
    wishing to monitor the variables of CCID and instrument the code
    [perhaps using Kprobes]. Personally, I am a fan of
    encapsulation---that justifies this change =D.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • When multiple losses occur in one RTT, the window should be halved
    only once [a single "congestion event"]. This is now implemented,
    although not perfectly. Slightly changed the interface for changing
    the cwnd: pass hctx instead of dp. This is required in order to allow
    for change_cwnd to be called from _init().

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Allocate more sequence state on demand. Each time a packet is sent
    out by CCID2, a record of it needs to be kept. This list of records
    grows proportionally to cwnd. Previously, the length of this list was
    hardcored and therefore the cwnd could only grow to this value (of
    128). Now, records are allocated on demand as necessary---cwnd may
    grow as it wishes. The exceptional case of when memory is not
    available is not handled gracefully. Perhaps, cwnd should be capped
    at that point.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Allow the user to choose whether or not to enable CCID2 debugging via
    Kconfig.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • If not enough cwnd is available, tell the sender to check again as
    soon as possible. This will increase CPU utilization (polling
    frequently for cwnd) but will improve network performance. That is,
    the sender will need to wait less before detecting the increase of
    cwnd. A better architecture would be for the CCID to call-back (or
    dequeue) from DCCP when it is able to transmit traffic -- not the
    other way around as it currently occurs.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Initialize the slow-start threshold to infinity. This way, upon connection
    initiation, slow-start will be exited only upon a packet loss. This patch will
    allow connections to quickly gain speed.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Jiffies are now handled correctly (I hope) in CCID2. If they wrap, no
    problem.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Get rid of unused variables in ackvector state.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Fix the way state is masked out. DCCP_ACKVEC_STATE_NOT_RECEIVED is
    defined as appears in the packet, therefore bit shifting is not
    required. This fix allows CCID2 to correctly detect losses.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Fix ackvector length calculation upon receiving an "ack-of-ack". This
    patch avoids the ackvector from growing too large which causes it to
    not be inserted into packets.

    Signed-off-by: Andrea Bittau
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Andrea Bittau
     
  • Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
    needlock = 0, while socket is not locked at that moment. In order to avoid
    this and similar issues in the future, use rcu for sk->sk_filter field read
    protection.

    Signed-off-by: Dmitry Mishin
    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev

    Dmitry Mishin
     
  • As Arnaldo Carvalho de Melo points out I should be using list_entry in case
    the structure changes in future. Current code functions but is reliant
    on position and requires type cast.

    Noticed when doing this that I have one more variable than I needed so
    removing that also.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This adds transmit buffering to DCCP.

    I have tested with CCID2/3 and with loss and rate limiting.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This shifts further sysctls into feat.h. No change in
    functionality - shifting code only.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • Based on MIPL2 kernel patch.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: Ville Nuorvala
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Right now most inet_lookup_* functions take a host-order hnum instead
    of a network-order dport because that's how it is represented
    internally.

    This means that users of these functions have to be careful about
    using the right byte-order. To add more confusion, inet_lookup takes
    a network-order dport unlike all other functions.

    So this patch changes all visible inet_lookup functions to take a
    dport and move all dport->hnum conversion inside them.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This automatically labels the TCP, Unix stream, and dccp child sockets
    as well as openreqs to be at the same MLS level as the peer. This will
    result in the selection of appropriately labeled IPSec Security
    Associations.

    This also uses the sock's sid (as opposed to the isec sid) in SELinux
    enforcement of secmark in rcv_skb and postroute_last hooks.

    Signed-off-by: Venkat Yekkirala
    Signed-off-by: David S. Miller

    Venkat Yekkirala
     
  • This labels the flows that could utilize IPSec xfrms at the points the
    flows are defined so that IPSec policy and SAs at the right label can
    be used.

    The following protos are currently not handled, but they should
    continue to be able to use single-labeled IPSec like they currently
    do.

    ipmr
    ip_gre
    ipip
    igmp
    sit
    sctp
    ip6_tunnel (IPv6 over IPv6 tunnel device)
    decnet

    Signed-off-by: Venkat Yekkirala
    Signed-off-by: David S. Miller

    Venkat Yekkirala
     

27 Aug, 2006

5 commits

  • This fixes CCID3 to give much closer performance to RFC4342.

    CCID3 is meant to alter sending rate based on RTT and loss.

    The performance was verified against:
    http://wand.net.nz/~perry/max_download.php

    For example I tested with netem and had the following parameters:
    Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.

    This gives a theoretical speed of 71.9 Kbits/s. I measured across three
    runs with this patch set and got 70.1 Kbits/s. Without this patchset the
    average was 232 Kbits/s which means Linux can't be used for CCID3 research
    properly.

    I also tested with netem turned off so box just acting as router with 1.2
    msec RTT. The performance with this is the same with or without the patch
    at around 30 Mbit/s.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This adds a new function dccp_rx_hist_find_entry.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This adds a new function to see if two sequence numbers follow each
    other.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • Just updating copyright and contacts

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This fixes a small typo in net/dccp/libs/packet_history.c

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     

03 Aug, 2006

1 commit

  • The current users of ip6_dst_lookup can be divided into two classes:

    1) The caller holds no locks and is in user-context (UDP).
    2) The caller does not want to lookup the dst cache at all.

    The second class covers everyone except UDP because most people do
    the cache lookup directly before calling ip6_dst_lookup. This patch
    adds ip6_sk_dst_lookup for the first class.

    Similarly ip6_dst_store users can be divded into those that need to
    take the socket dst lock and those that don't. This patch adds
    __ip6_dst_store for those (everyone except UDP/datagram) that don't
    need an extra lock.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

25 Jul, 2006

1 commit

  • When using the default sequence window size (100) I got the following in
    my logs:

    Jun 22 14:24:09 localhost kernel: [ 1492.114775] DCCP: Step 6 failed for
    DATA packet, (LSWL(6279674225)
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Ian McDonald
     

11 Jul, 2006

1 commit


01 Jul, 2006

1 commit


23 Jun, 2006

1 commit


18 Jun, 2006

1 commit


12 Jun, 2006

1 commit


06 May, 2006

1 commit

  • Calling sock_orphan inside bh_lock_sock in dccp_close can lead to dead
    locks. For example, the inet_diag code holds sk_callback_lock without
    disabling BH. If an inbound packet arrives during that admittedly tiny
    window, it will cause a dead lock on bh_lock_sock. Another possible
    path would be through sock_wfree if the network device driver frees the
    tx skb in process context with BH enabled.

    We can fix this by moving sock_orphan out of bh_lock_sock.

    The tricky bit is to work out when we need to destroy the socket
    ourselves and when it has already been destroyed by someone else.

    By moving sock_orphan before the release_sock we can solve this
    problem. This is because as long as we own the socket lock its
    state cannot change.

    So we simply record the socket state before the release_sock
    and then check the state again after we regain the socket lock.
    If the socket state has transitioned to DCCP_CLOSED in the time being,
    we know that the socket has been destroyed. Otherwise the socket is
    still ours to keep.

    This problem was discoverd by Ingo Molnar using his lock validator.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

12 Apr, 2006

1 commit


30 Mar, 2006

1 commit

  • From: Randy Dunlap

    Use NULL instead of 0 for pointers.
    Fix these sparse warnings:
    net/dccp/feat.c:207:20: warning: Using plain integer as NULL pointer
    net/dccp/feat.c:325:21: warning: Using plain integer as NULL pointer
    net/dccp/feat.c:526:20: warning: Using plain integer as NULL pointer

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     

26 Mar, 2006

1 commit

  • Implement the half-closed devices notifiation, by adding a new POLLRDHUP
    (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the
    existing POLLHUP handling, that does not report correctly half-closed
    devices, was feared to be changed, this implementation leaves the current
    POLLHUP reporting unchanged and simply add a new bit that is set in the few
    places where it makes sense. The same thing was discussed and conceptually
    agreed quite some time ago:

    http://lkml.org/lkml/2003/7/12/116

    Since this new event bit is added to the existing Linux poll infrastruture,
    even the existing poll/select system calls will be able to use it. As far
    as the existing POLLHUP handling, the patch leaves it as is. The
    pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
    archs and sets the bit in the six relevant files. The other attached diff
    is the simple change required to sys/epoll.h to add the EPOLLRDHUP
    definition.

    There is "a stupid program" to test POLLRDHUP delivery here:

    http://www.xmailserver.org/pollrdhup-test.c

    It tests poll(2), but since the delivery is same epoll(2) will work equally.

    Signed-off-by: Davide Libenzi
    Cc: "David S. Miller"
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

21 Mar, 2006

3 commits