16 Apr, 2012

1 commit


21 Dec, 2011

1 commit

  • When checking whether a DATA chunk fits into the estimated rwnd a
    full sizeof(struct sk_buff) is added to the needed chunk size. This
    quickly exhausts the available rwnd space and leads to packets being
    sent which are much below the PMTU limit. This can lead to much worse
    performance.

    The reason for this behaviour was to avoid putting too much memory
    pressure on the receiver. The concept is not completely irational
    because a Linux receiver does in fact clone an skb for each DATA chunk
    delivered. However, Linux also reserves half the available socket
    buffer space for data structures therefore usage of it is already
    accounted for.

    When proposing to change this the last time it was noted that this
    behaviour was introduced to solve a performance issue caused by rwnd
    overusage in combination with small DATA chunks.

    Trying to reproduce this I found that with the sk_buff overhead removed,
    the performance would improve significantly unless socket buffer limits
    are increased.

    The following numbers have been gathered using a patched iperf
    supporting SCTP over a live 1 Gbit ethernet network. The -l option
    was used to limit DATA chunk sizes. The numbers listed are based on
    the average of 3 test runs each. Default values have been used for
    sk_(r|w)mem.

    Chunk
    Size Unpatched No Overhead
    -------------------------------------
    4 15.2 Kbit [!] 12.2 Mbit [!]
    8 35.8 Kbit [!] 26.0 Mbit [!]
    16 95.5 Kbit [!] 54.4 Mbit [!]
    32 106.7 Mbit 102.3 Mbit
    64 189.2 Mbit 188.3 Mbit
    128 331.2 Mbit 334.8 Mbit
    256 537.7 Mbit 536.0 Mbit
    512 766.9 Mbit 766.6 Mbit
    1024 810.1 Mbit 808.6 Mbit

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

25 Aug, 2011

1 commit


14 Jul, 2011

1 commit


08 Jul, 2011

1 commit

  • When initiating a graceful shutdown while having data chunks
    on the retransmission queue with a peer which is in zero
    window mode the shutdown is never completed because the
    retransmission error count is reset periodically by the
    following two rules:

    - Do not timeout association while doing zero window probe.
    - Reset overall error count when a heartbeat request has
    been acknowledged.

    The graceful shutdown will wait for all outstanding TSN to
    be acknowledged before sending the SHUTDOWN request. This
    never happens due to the peer's zero window not acknowledging
    the continuously retransmitted data chunks. Although the
    error counter is incremented for each failed retransmission,
    the receiving of the SACK announcing the zero window clears
    the error count again immediately. Also heartbeat requests
    continue to be sent periodically. The peer acknowledges these
    requests causing the error counter to be reset as well.

    This patch changes behaviour to only reset the overall error
    counter for the above rules while not in shutdown. After
    reaching the maximum number of retransmission attempts, the
    T5 shutdown guard timer is scheduled to give the receiver
    some additional time to recover. The timer is stopped as soon
    as the receiver acknowledges any data.

    The issue can be easily reproduced by establishing a sctp
    association over the loopback device, constantly queueing
    data at the sender while not reading any at the receiver.
    Wait for the window to reach zero, then initiate a shutdown
    by killing both processes simultaneously. The association
    will never be freed and the chunks on the retransmission
    queue will be retransmitted indefinitely.

    Signed-off-by: Thomas Graf
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Thomas Graf
     

02 Jun, 2011

1 commit

  • In this case, the SCTP association transmits an ASCONF packet
    including addition of the new IP address and deletion of the old
    address. This patch implements this functionality.
    In this case, the ASCONF chunk is added to the beginning of the
    queue, because the other chunks cannot be transmitted in this state.

    Signed-off-by: Michio Honda
    Signed-off-by: YOSHIFUJI Hideaki
    Acked-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Michio Honda
     

20 Apr, 2011

3 commits

  • If there is still data waiting to retransmit and remain in
    retransmit queue, while doing the next retransmit, if the
    chunk is abandoned, we should move it to abandoned list.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • SCTP does not SCTP_STATE_EMPTY and we can never be in
    that state. Remove useless code.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When we have have to remove a transport due to ASCONF, we move
    the data to a new active path. This can trigger CACC algorithm
    to not mark that data as missing when SACKs arrive. This is
    because the transport passed to the CACC algorithm is the one
    this data is sitting on, not the one it was sent on (that one
    may be gone). So, by sending the original transport (even if
    it's NULL), we may start marking data as missing.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

31 Mar, 2011

1 commit


08 Mar, 2011

1 commit


27 Aug, 2010

1 commit

  • Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
    use do { print } while (0) guards.
    Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
    lines were continued.
    Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    Add a missing newline in "Failed bind hash alloc"

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

01 May, 2010

7 commits

  • Right now, if the highest tsn in the SACK doesn't change, we'll
    end up scanning the transmitted lists on the transports twice:
    once for locating the highest _new_ tsn, and once for actually
    tagging chunks as acked. This is a waste, since we can record
    the highest _new_ tsn at the same time as tagging chunks. Long
    ago this was not possible because we would try to mark chunks
    as missing at the same time as tagging them acked and this approach
    didn't work. Now that the two steps are separate, we can re-use
    the old approach.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • According to RFC 4960 Section 7.2.4:
    If an endpoint is in Fast
    Recovery and a SACK arrives that advances the Cumulative TSN Ack
    Point, the miss indications are incremented for all TSNs reported
    missing in the SACK.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • We don't need to force the T3 timer any more and it's
    actually wrong to do as it causes too long of a delay.
    The timer will be started if one is not running, but if
    one is running, we leave it alone.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • The 'resent' bit is used to make sure that we don't update
    rto estimate based on retransmitted chunks. However, we already
    have the 'rto_pending' bit that we test when need to update rto,
    so 'resent' bit is just extra. Additionally, we currently have
    a bug in that we always set a 'resent' bit and thus rto estimate
    is only updated by Heartbeats.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • sctp_chunk_is_data macro is defined to decide that
    whether a chunk is data chunk or not.

    Signed-off-by: Shan Wei
    Signed-off-by: Vlad Yasevich

    Shan Wei
     
  • While doing retranmit, if control chunk exists, such as
    FORWARD TSN chunk, and the DATA chunk can not be bundled with
    this control chunk because of PMTU limit, no DATA chunk
    will be retranmitted in the current implementation. This
    patch makes sure to retranmit at least one DATA chunk in this case.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP:
    C5) If a FORWARD TSN is sent, the sender MUST assure that at
    least one T3-rtx timer is running.

    So this patch fix to assure at least one T3-rtx timer is running
    if a FORWARD TSN is or will to sent.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Nov, 2009

1 commit


29 Nov, 2009

2 commits

  • Conflicts:
    drivers/ieee802154/fakehard.c
    drivers/net/e1000e/ich8lan.c
    drivers/net/e1000e/phy.c
    drivers/net/netxen/netxen_nic_init.c
    drivers/net/wireless/ath/ath9k/main.c

    David S. Miller
     
  • When retransmitting due to T3 timeout, retransmit all the
    in-flight chunks for the corresponding transport/path, including
    chunks sent less then 1 rto ago.
    This is the correct behaviour according to rfc4960 section 6.3.3
    E3 and
    "Note: Any DATA chunks that were sent to the address for which the
    T3-rtx timer expired but did not fit in one MTU (rule E3 above)
    should be marked for retransmission and sent as soon as cwnd
    allows (normally, when a SACK arrives). ".

    This fixes problems when more then one path is present and the T3
    retransmission of the first chunk that timeouts stops the T3 timer
    for the initial active path, leaving all the other in-flight
    chunks waiting forever or until a new chunk is transmitted on the
    same path and timeouts (and this will happen only if the cwnd
    allows sending new chunks, but since cwnd was dropped to MTU by
    the timeout => it will wait until the first heartbeat).

    Example: 10 packets in flight, sent at 0.1 s intervals on the
    primary path. The primary path is down and the first packet
    timeouts. The first packet is retransmitted on another path, the
    T3 timer for the primary path is stopped and cwnd is set to MTU.
    All the other 9 in-flight packets will not be retransmitted
    (unless more new packets are sent on the primary path which depend
    on cwnd allowing it, and even in this case the 9 packets will be
    retransmitted only after a new packet timeouts which even in the
    best case would be more then RTO).

    This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and
    also removes the now unused transport->last_rto, introduced in
    b6157d8e03e1e780660a328f7183bcbfa4a93a19.

    p.s The problem is not only when multiple paths are there. It
    can happen in a single homed environment. If the application
    stops sending data, it possible to have a hung association.

    Signed-off-by: Andrei Pelinescu-Onciul
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Andrei Pelinescu-Onciul
     

24 Nov, 2009

2 commits

  • Current implementation of max.burst ends up limiting new
    data during cwnd decay period. The decay is happening becuase
    the connection is idle and we are allowed to fill the congestion
    window. The point of max.burst is to limit micro-bursts in response
    to large acks. This still happens, as max.burst is still applied
    to each transmit opportunity. It will also apply if a very large
    send is made (greater then allowed by burst).

    Tested-by: Florian Niederbacher
    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • This patch implement the sender side for SACK-IMMEDIATELY
    extension.

    Section 4.1. Sender Side Considerations

    Whenever the sender of a DATA chunk can benefit from the
    corresponding SACK chunk being sent back without delay, the sender
    MAY set the I-bit in the DATA chunk header.

    Reasons for setting the I-bit include

    o The sender is in the SHUTDOWN-PENDING state.

    o The application requests to set the I-bit of the last DATA chunk
    of a user message when providing the user message to the SCTP
    implementation.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     

05 Sep, 2009

1 commit


14 Mar, 2009

1 commit

  • RFC3758 Section 3.3.1. Sending Forward-TSN-Supported param in INIT

    Note that if the endpoint chooses NOT to include the parameter, then
    at no time during the life of the association can it send or process
    a FORWARD TSN.

    If peer does not support PR-SCTP capable, don't send FORWARD-TSN chunk
    to peer.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

03 Mar, 2009

1 commit


23 Jan, 2009

1 commit

  • Commit 62aeaff5ccd96462b7077046357a6d7886175a57
    (sctp: Start T3-RTX timer when fast retransmitting lowest TSN)
    introduced a regression where it was possible to forcibly
    restart the sctp retransmit timer at the transmission of any
    new chunk. This resulted in much longer timeout times and
    sometimes hung sctp connections.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

01 Oct, 2008

4 commits


23 Jul, 2008

1 commit


20 Jun, 2008

1 commit

  • RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts

    When an SCTP stack receives a packet containing multiple control or
    DATA chunks and the processing of the packet requires the sending of
    multiple chunks in response, the sender of the response chunk(s) MUST
    NOT send more than one packet. If bundling is supported, multiple
    response chunks that fit into a single packet MAY be bundled together
    into one single response packet. If bundling is not supported, then
    the sender MUST NOT send more than one response chunk and MUST
    discard all other responses. Note that this rule does NOT apply to a
    SACK chunk, since a SACK chunk is, in itself, a response to DATA and
    a SACK does not require a response of more DATA.

    We implement this by not servicing our outqueue until we reach the end
    of the packet. This enables maximum bundling. We also identify
    'response' chunks and make sure that we only send 1 packet when sending
    such chunks.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

05 Jun, 2008

2 commits

  • When fast retransmit is triggered by a sack, we should flush the queue
    only once so that only 1 retransmit happens. Also, since we could
    potentially have non-fast-rtx chunks on the retransmit queue, we need
    make sure any chunks eligable for fast retransmit are sent first
    during fast retransmission.

    Signed-off-by: Vlad Yasevich
    Tested-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When we are trying to fast retransmit the lowest outstanding TSN, we
    need to restart the T3-RTX timer, so that subsequent timeouts will
    correctly tag all the packets necessary for retransmissions.

    Signed-off-by: Vlad Yasevich
    Tested-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

18 Apr, 2008

1 commit


14 Apr, 2008

1 commit