15 Sep, 2009

2 commits


05 Sep, 2009

27 commits

  • Since our TSN map is capable of holding at most a 4K chunk gap,
    there is no way that during this gap, a stream sequence number
    (unsigned short) can wrap such that the new number is smaller
    then the next expected one. If such a case is encountered,
    this is a protocol violation.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Use sctp_packet_reset() instead of dup code.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • This patch introduces a new sysctl option to make IPv4 Address Scoping
    configurable .

    In networking environments where DNAT rules in iptables prerouting
    chains convert destination IP's to link-local/private IP addresses,
    SCTP connections fail to establish as the INIT chunk is dropped by the
    kernel due to address scope match failure.
    For example to support overlapping IP addresses (same IP address with
    different vlan id) a Layer-5 application listens on link local IP's,
    and there is a DNAT rule that maps the destination IP to a link local
    IP. Such applications never get the SCTP INIT if the address-scoping
    draft is strictly followed.

    This sysctl configuration allows SCTP to function in such
    unconventional networking environments.

    Sysctl options:
    0 - Disable IPv4 address scoping draft altogether
    1 - Enable IPv4 address scoping (default, current behavior)
    2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
    3 - Enable address scoping but allow IPv4 link local address in init/init-ack

    Signed-off-by: Bhaskar Dutta
    Signed-off-by: Vlad Yasevich

    Bhaskar Dutta
     
  • We used to perform 2 routing lookups for a new transport: one
    just for path mtu detection, and one to actually route to destination
    and path mtu update when sending a packet. There is no point in doing
    both of them, especially since the first one just for path mtu doesn't
    take into account source address and sometimes gives the wrong route,
    causing path mtu updates anyway.

    We now do just the one call to do both route to destination and get
    path mtu updates.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • We currently track if AUTH has been bundled using the 'auth'
    pointer to the chunk. However, AUTH is disallowed after DATA
    is already in the packet, so we need to instead use the
    'has_auth' field.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • The packet information does not reset after packet transmit, this
    may cause some problems such as following DATA chunk be sent without
    AUTH chunk, even if the authentication of DATA chunk has been
    requested by the peer.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • Add-IP feature allows users to delete an active transport. If that
    transport has chunks in flight, those chunks need to be moved to another
    transport or association may get into unrecoverable state.

    Reported-by: Rafael Laufer
    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • We had a bug that we never stored the user-defined value for
    MAXSEG when setting the value on an association. Thus future
    PMTU events ended up re-writing the frag point and increasing
    it past user limit. Additionally, when setting the option on
    the socket/endpoint, we effect all current associations, which
    is against spec.

    Now, we store the user 'maxseg' value along with the computed
    'frag_point'. We inherit 'maxseg' from the socket at association
    creation and use it as an upper limit for 'frag_point' when its
    set.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • SCTP will delay the last part of a large write due to NAGLE, if that
    part is smaller then MTU. Since we are doing large writes, we might
    as well send the last portion now instead of waiting untill the next
    large write happens. The small portion will be sent as is regardless,
    so it's better to not delay it.

    This is a result of much discussions with Wei Yongjun
    and Doug Graham . Many thanks go out to them.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • The decision to delay due to Nagle should be based on the path mtu
    and future packet size. We currently incorrectly base it on
    'frag_point' which is the SCTP DATA segment size, and also we do
    not count DATA chunk header overhead in the computation. This
    actuall allows situations where a user can set low 'frag_point',
    and then send small messages without delay.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
    This results in an hung association if the remote only uses
    SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
    closing. The reason for that is that we simply honor the a_rwnd
    from the sack, but since we faked it to be 0, we enter 0-window
    probing. The fix is to use the peers old rwnd and add our flight
    size to it.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • SCTP has a problem that when small chunks are used, it is possible
    to exhaust the receiver buffer without fully closing receive window.
    This happens due to all overhead that we have account for with small
    messages. To fix this, when receive buffer is exceeded, we'll drop
    the window to 0 and save the 'drop' portion. When application starts
    reading data and freeing up recevie buffer space, we'll wait until
    we've reached the 'drop' window and then add back this 'drop' one
    mtu at a time. This worked well in testing and under stress produced
    rather even recovery.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • If T3 timer expires, we are retransmitting data due to timeout any
    any fast recovery is null and void. We can clear the fast recovery
    flag.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
    errors agains a given transport or endpoint. As such, we
    should increment the error counts for only for unacknowledged
    HB, otherwise we detect failure too soon. This goes for both
    the overall error count and the path error count.

    Now, there is a difference in how the detection is done
    between the two. The path error detection is done after
    the increment, so to detect it properly, we actually need
    to exceed the path threshold. The overall error detection
    is done _BEFORE_ the increment. Thus to detect the failure,
    it's enough for the error count to match the threshold.
    This is why all the state functions use '>=' to detect failure,
    while path detection uses '>'.

    Thanks goes to Chunbo Luo who first
    proposed patches to fix this issue and made me re-read the spec
    and the code to figure out how this cruft really works.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • create_proc_entry() is deprecated (not formally, though).

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Vlad Yasevich

    Alexey Dobriyan
     
  • The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
    that contains the Heartbeat Information field copied from the
    received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
    must have a length of:
    sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)

    A badly formatted HB-ACK chunk, it is possible that we may access
    invalid memory. We should really make sure that the chunk format
    is what we expect, before attempting to touch the data.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
    Cumulative TSN Ack Point then drop the SHUTDOWN chunk.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • Currenlty, sctp breaks up user messages into fragments and
    sends each fragment to the lower layer by itself. This means
    that for each fragment we go all the way down the stack
    and back up. This also discourages bundling of multiple
    fragments when they can fit into a sigle packet (ex: due
    to user setting a low fragmentation threashold).

    We introduce a new command SCTP_CMD_SND_MSG and hand the
    whole message down state machine. The state machine and
    the side-effect parser will cork the queue, add all chunks
    from the message to the queue, and then un-cork the queue
    thus causing the chunks to get transmitted.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • If the association has a SACK timer pending and now DATA queued
    to be send, we'll try to bundle the SACK with the next application send.
    As such, try encourage bundling by accounting for SACK in the size
    of the first chunk fragment.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • We are now trying to bundle SACKs when we have outbound
    DATA to send. However, there are situations where this
    outbound DATA will not be sent (due to congestion or
    available window). In such cases it's ok to wait for the
    timer to expire. This patch refactors the sending code
    so that betfore attempting to bundle the SACK we check
    to see if the DATA will actually be transmitted.

    Based on eirlier works for Doug Graham and
    Wei Youngjun .

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Since an application may specify the maximum SCTP fragment size
    that all data should be fragmented to, we need to fix how
    we do segmentation. Right now, if a user specifies a small
    fragment size, the segment size can go negative in the presence
    of AUTH or COOKIE_ECHO bundling.

    What we need to do is track the largest possbile DATA chunk that
    can fit into the mtu. Then if the fragment size specified is
    bigger then this maximum length, we'll shrink it down. Otherwise,
    we just use the smaller segment size without changing it further.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • If a socket has a lot of association that are in the process of
    of being closed/aborted, it is possible for a remote to establish
    new associations during the time period that the old ones are shutting
    down. If this was a result of a close() call, there will be no socket
    and will cause a memory leak. We'll prevent this by setting the
    socket state to CLOSING and disallow new associations when in this state.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • This patch corrects the conditions under which a SACK will be piggybacked
    on a DATA packet. The previous condition was incorrect due to a
    misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the
    following paragraph from section 6.2 had not been implemented correctly:

    Before an endpoint transmits a DATA chunk, if any received DATA
    chunks have not been acknowledged (e.g., due to delayed ack), the
    sender should create a SACK and bundle it with the outbound DATA
    chunk, as long as the size of the final SCTP packet does not exceed
    the current MTU. See Section 6.2.

    When about to send a DATA chunk, the code now checks to see if the SACK
    timer is running. If it is, we know we have a SACK to send to the
    peer, so we append the SACK (assuming available space in the packet)
    and turn off the timer. For a simple request-response scenario, this
    will result in the SACK being bundled with the response, meaning the
    the SACK is received quickly by the client, and also meaning that no
    separate SACK packet needs to be sent by the server to acknowledge the
    request. Prior to this patch, a separate SACK packet would have been
    sent by the server SCTP only after its delayed-ACK timer had expired
    (usually 200ms). This is wasteful of bandwidth, and can also have a
    major negative impact on performance due the interaction of delayed ACKs
    with the Nagle algorithm.

    Signed-off-by: Doug Graham
    Signed-off-by: Vlad Yasevich

    Doug Graham
     
  • When the sctp transport is marked down, we can release the
    cached route and force a new lookup when attempting to use
    this transport for anything. This way, if a better route
    or source address is available, we'll try to use it.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Update the route and saddr entries for the non-active transports as some
    of the added addresses can be used as better source addresses, or may
    be there is a better route.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • This patch fix to check the unrecognized ASCONF parameter before
    access it.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • The return value of sctp_process_asconf_ack() may be
    overwritten while process parameters with no error.
    This patch fixed the problem.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     

13 Aug, 2009

1 commit


10 Aug, 2009

1 commit

  • Commit 1748376b6626acf59c24e9592ac67b3fe2a0e026,
    net: Use a percpu_counter for sockets_allocated

    added percpu_counter function calls to sctp_proc_init code path, but
    forgot to add them to sctp_proc_exit(). This resulted in a following
    Ooops when performing this test
    # modprobe sctp
    # rmmod -f sctp
    # modprobe sctp

    [ 573.862512] BUG: unable to handle kernel paging request at f8214a24
    [ 573.862518] IP: [] __percpu_counter_init+0x3f/0x70
    [ 573.862530] *pde = 37010067 *pte = 00000000
    [ 573.862534] Oops: 0002 [#1] SMP
    [ 573.862537] last sysfs file: /sys/module/libcrc32c/initstate
    [ 573.862540] Modules linked in: sctp(+) crc32c libcrc32c binfmt_misc bridge
    stp bnep lp snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep
    snd_pcm_oss snd_mixer_oss arc4 joydev snd_pcm ecb pcmcia snd_seq_dummy
    snd_seq_oss iwlagn iwlcore snd_seq_midi snd_rawmidi snd_seq_midi_event
    yenta_socket rsrc_nonstatic thinkpad_acpi snd_seq snd_timer snd_seq_device
    mac80211 psmouse sdhci_pci sdhci nvidia(P) ppdev video snd soundcore serio_raw
    pcspkr iTCO_wdt iTCO_vendor_support led_class ricoh_mmc pcmcia_core intel_agp
    nvram agpgart usbhid parport_pc parport output snd_page_alloc cfg80211 btusb
    ohci1394 ieee1394 e1000e [last unloaded: sctp]
    [ 573.862589]
    [ 573.862593] Pid: 5373, comm: modprobe Tainted: P R (2.6.31-rc3 #6)
    7663B15
    [ 573.862596] EIP: 0060:[] EFLAGS: 00010286 CPU: 1
    [ 573.862599] EIP is at __percpu_counter_init+0x3f/0x70
    [ 573.862602] EAX: f8214a20 EBX: f80faa14 ECX: c48c0000 EDX: f80faa20
    [ 573.862604] ESI: f80a7000 EDI: 00000000 EBP: f69d5ef0 ESP: f69d5eec
    [ 573.862606] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    [ 573.862610] Process modprobe (pid: 5373, ti=f69d4000 task=c2130c70
    task.ti=f69d4000)
    [ 573.862612] Stack:
    [ 573.862613] 00000000 f69d5f18 f80a70a8 f80fa9fc 00000000 fffffffc f69d5f30
    c018e2d4
    [ 573.862619] 00000000 f80a7000 00000000 f69d5f88 c010112b 00000000
    c07029c0 fffffffb
    [ 573.862626] 00000000 f69d5f38 c018f83f f69d5f54 c0557cad f80fa860
    00000001 c07010c0
    [ 573.862634] Call Trace:
    [ 573.862644] [] ? sctp_init+0xa8/0x7d4 [sctp]
    [ 573.862650] [] ? marker_update_probe_range+0x184/0x260
    [ 573.862659] [] ? sctp_init+0x0/0x7d4 [sctp]
    [ 573.862662] [] ? do_one_initcall+0x2b/0x160
    [ 573.862666] [] ? tracepoint_module_notify+0x2f/0x40
    [ 573.862671] [] ? notifier_call_chain+0x2d/0x70
    [ 573.862678] [] ? __blocking_notifier_call_chain+0x4d/0x60
    [ 573.862682] [] ? sys_init_module+0xb1/0x1f0
    [ 573.862686] [] ? sysenter_do_call+0x12/0x28
    [ 573.862688] Code: 89 48 08 b8 04 00 00 00 e8 df aa ec ff ba f4 ff ff ff 85
    c0 89 43 14 74 31 b8 b0 18 71 c0 e8 19 b9 24 00 a1 c4 18 71 c0 8d 53 0c 50
    04 89 43 0c b8 b0 18 71 c0 c7 43 10 c4 18 71 c0 89 15 c4
    [ 573.862725] EIP: [] __percpu_counter_init+0x3f/0x70 SS:ESP
    0068:f69d5eec
    [ 573.862730] CR2: 00000000f8214a24
    [ 573.862734] ---[ end trace 39c4e0b55e7cf54d ]---

    Signed-off-by: Rafael Laufer
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Rafael Laufer
     

06 Aug, 2009

1 commit


07 Jul, 2009

1 commit

  • Commit 'net: Move rx skb_orphan call to where needed' broken sctp protocol
    with warning at inet_sock_destruct(). Actually, sctp can do this right with
    sctp_sock_rfree_frag() and sctp_skb_set_owner_r_frag() pair.

    sctp_sock_rfree_frag(skb);
    sctp_skb_set_owner_r_frag(skb, newsk);

    This patch not revert the commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f,
    instead remove the sctp_sock_rfree_frag() function.

    ------------[ cut here ]------------
    WARNING: at net/ipv4/af_inet.c:151 inet_sock_destruct+0xe0/0x142()
    Modules linked in: sctp ipv6 dm_mirror dm_region_hash dm_log dm_multipath
    scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
    Pid: 1808, comm: sctp_test Not tainted 2.6.31-rc2 #40
    Call Trace:
    [] warn_slowpath_common+0x6a/0x81
    [] ? inet_sock_destruct+0xe0/0x142
    [] warn_slowpath_null+0x12/0x15
    [] inet_sock_destruct+0xe0/0x142
    [] __sk_free+0x19/0xcc
    [] sk_free+0x18/0x1a
    [] sctp_close+0x192/0x1a1 [sctp]
    [] inet_release+0x47/0x4d
    [] sock_release+0x19/0x5e
    [] sock_close+0x21/0x25
    [] __fput+0xde/0x189
    [] fput+0x18/0x1a
    [] filp_close+0x56/0x60
    [] put_files_struct+0x5d/0xa1
    [] exit_files+0x39/0x3d
    [] do_exit+0x1a5/0x5dd
    [] ? d_kill+0x35/0x3b
    [] ? dequeue_signal+0xa6/0x115
    [] do_group_exit+0x63/0x8a
    [] get_signal_to_deliver+0x2e1/0x2f9
    [] do_notify_resume+0x7c/0x6b5
    [] ? autoremove_wake_function+0x0/0x34
    [] ? __d_free+0x3d/0x40
    [] ? d_free+0x2a/0x3c
    [] ? vfs_write+0x103/0x117
    [] ? sys_socketcall+0x178/0x182
    [] work_notifysig+0x13/0x19
    ---[ end trace 9db92c463e789fba ]---

    Signed-off-by: Wei Yongjun
    Acked-by: Herbert Xu
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

30 Jun, 2009

1 commit

  • Commit 'net: skb->dst accessors'(adf30907d63893e4208dfe3f5c88ae12bc2f25d5)
    broken the sctp protocol stack, the sctp packet can never be sent out after
    Eric Dumazet's patch, which have typo in the sctp code.

    Signed-off-by: Wei Yongjun
    Acked-by: Eric Dumazet
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

23 Jun, 2009

1 commit


18 Jun, 2009

1 commit

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Jun, 2009

1 commit


09 Jun, 2009

1 commit


03 Jun, 2009

2 commits

  • Prior implementation of the new sctp_connectx() call that returns
    an association ID did not work correctly on non-blocking socket.
    This is because we could not return both a EINPROGRESS error and
    an association id. This is a new implementation that supports this.

    Originally from Ivan Skytte Jørgensen

    Vlad Yasevich
     
  • RFC 5061 Section 5.1 ASCONF Chunk Procedures said:

    B4) Re-transmit the ASCONF Chunk last sent and if possible choose an
    alternate destination address (please refer to [RFC4960],
    Section 6.4.1). An endpoint MUST NOT add new parameters to this
    chunk; it MUST be the same (including its Sequence Number) as
    the last ASCONF sent. An endpoint MAY, however, bundle an
    additional ASCONF with new ASCONF parameters with the next
    Sequence Number. For details, see Section 5.5.

    This patch fix to choose an alternate destination address when
    re-transmit the ASCONF chunk, with some dup codes cleanup.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun