24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this sctp implementation is free software you can redistribute it
    and or modify it under the terms of the gnu general public license
    as published by the free software foundation either version 2 or at
    your option any later version this sctp implementation is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with gnu cc see the file copying if not see
    http www gnu org licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 42 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 Apr, 2019

1 commit

  • sk_forward_alloc's updating is also done on rx path, but to be consistent
    we change to use sk_mem_charge() in sctp_skb_set_owner_r().

    In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
    which doesn't work for mem_cgroup_sockets_enabled, so we change to use
    sk_under_memory_pressure().

    When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
    should be called on both RENEGE or CHUNK DELIVERY path exit the memory
    pressure status as soon as possible.

    Note that sk_rmem_schedule() is using datalen to make things easy there.

    Reported-by: Matteo Croce
    Tested-by: Matteo Croce
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

11 May, 2018

1 commit

  • In Commit 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too"),
    it held the chunk in sctp_ulpevent_make_rcvmsg to access it safely later
    in recvmsg. However, it also added sctp_chunk_put in fail_mark err path,
    which is only triggered before holding the chunk.

    syzbot reported a use-after-free crash happened on this err path, where
    it shouldn't call sctp_chunk_put.

    This patch simply removes this call.

    Fixes: 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too")
    Reported-by: syzbot+141d898c5f24489db4aa@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

12 Dec, 2017

3 commits

  • abort_pd is added as a member of sctp_stream_interleave, used to abort
    partial delivery for data or idata, called in sctp_cmd_assoc_failed.

    Since stream interleave allows to do partial delivery for each stream
    at the same time, sctp_intl_abort_pd for idata would be very different
    from the old function sctp_ulpq_abort_pd for data.

    Note that sctp_ulpevent_make_pdapi will support per stream in this
    patch by adding pdapi_stream and pdapi_seq in sctp_pdapi_event, as
    described in section 6.1.7 of RFC6458.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • ulpevent_data is added as a member of sctp_stream_interleave, used to
    do the most process in ulpq, including to convert data or idata chunk
    to event, reasm them in reasm queue and put them in lobby queue in
    right order, and deliver them up to user sk rx queue.

    This procedure is described in section 2.2.3 of RFC8260.

    It adds most functions for idata here to do the similar process as
    the old functions for data. But since the details are very different
    between them, the old functions can not be reused for idata.

    event->ssn and event->ppid settings are moved to ulpevent_data from
    sctp_ulpevent_make_rcvmsg, so that sctp_ulpevent_make_rcvmsg could
    work for both data and idata.

    Note that mid is added in sctp_ulpevent for idata, __packed has to
    be used for defining sctp_ulpevent, or it would exceeds the skb cb
    that saves a sctp_ulpevent variable for ulp layer process.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • assign_number is added as a member of sctp_stream_interleave, used
    to assign ssn for data or mid (message id) for idata, called in
    sctp_packet_append_data. sctp_chunk_assign_ssn is left as it is,
    and sctp_chunk_assign_mid is added for sctp_stream_interleave_1.

    This procedure is described in section 2.2.2 of RFC8260.

    All sizeof(struct sctp_data_chunk) in tx path is replaced with
    sctp_datachk_len, to make it right for idata as well. And also
    adjust sctp_chunk_is_data for SCTP_CID_I_DATA.

    After this patch, idata can be built and sent in tx path.

    Note that if sp strm_interleave is set, it has to wait_connect in
    sctp_sendmsg, as asoc intl_enable need to be known after 4 shake-
    hands, to decide if it should use data or idata later. data and
    idata can't be mixed to send in one asoc.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

29 Oct, 2017

1 commit

  • These warnings were found by running 'make C=2 M=net/sctp/'.

    They are introduced by not aware of Endian when coding stream
    reconf patches.

    Since commit c0d8bab6ae51 ("sctp: add get and set sockopt for
    reconf_enable") enabled stream reconf feature for users, the
    Fixes tag below would use it.

    Fixes: c0d8bab6ae51 ("sctp: add get and set sockopt for reconf_enable")
    Reported-by: Eric Dumazet
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

04 Aug, 2017

1 commit

  • This patch is to remove the typedef sctp_errhdr_t, and replace
    with struct sctp_errhdr in the places where it's using this
    typedef.

    It is also to use sizeof(variable) instead of sizeof(type).

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

02 Jul, 2017

1 commit

  • This patch is to remove the typedef sctp_chunkhdr_t, and replace
    with struct sctp_chunkhdr in the places where it's using this
    typedef.

    It is also to fix some indents and use sizeof(variable) instead
    of sizeof(type)., especially in sctp_new.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

16 Jun, 2017

2 commits

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions (skb_put, __skb_put and pskb_put) return void *
    and remove all the casts across the tree, adding a (u8 *) cast only
    where the unsigned char pointer was used directly, all done with the
    following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    which actually doesn't cover pskb_put since there are only three
    users overall.

    A handful of stragglers were converted manually, notably a macro in
    drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
    instances in net/bluetooth/hci_sock.c. In the former file, I also
    had to fix one whitespace problem spatch introduced.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

13 Mar, 2017

2 commits


20 Feb, 2017

1 commit


22 Sep, 2016

1 commit

  • To something more meaningful these days, specially because this is
    working on packet headers or lengths and which are not tied to any CPU
    arch but to the protocol itself.

    So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.

    Reported-by: David Laight
    Reported-by: David Miller
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

09 Aug, 2016

1 commit

  • Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
    it's available") used event->chunk->head_skb to get the head_skb in
    sctp_ulpevent_set_owner().

    But at that moment, the event->chunk was NULL, as it cloned the skb
    in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
    work.

    This patch is to move the event->chunk initialization before calling
    sctp_ulpevent_receive_data() so that it uses event->chunk when it's
    valid.

    Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

26 Jul, 2016

1 commit

  • The head skb for GSO packets won't travel through the inner depths of
    SCTP stack as it doesn't contain any chunks on it. That means skb->sk
    doesn't get set and then when sctp_recvmsg() calls
    sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
    to check flags at the socket (sp->v4mapped).

    The fix is to initialize skb->sk for th head skb once we are able to do
    it. That is, when the first chunk is processed.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

14 Jul, 2016

2 commits

  • SCTP will try to access original IP headers on sctp_recvmsg in order to
    copy the addresses used. There are also other places that do similar access
    to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO
    support") they aren't always there because they are only present in the
    header skb.

    SCTP handles the queueing of incoming data by cloning the incoming skb
    and limiting to only the relevant payload. This clone has its cb updated
    to something different and it's then queued on socket rx queue. Thus we
    need to fix this in two moments.

    For rx path, not related to socket queue yet, this patch uses a
    partially copied sctp_input_cb to such GSO frags. This restores the
    ability to access the headers for this part of the code.

    Regarding the socket rx queue, it removes iif member from sctp_event and
    also add a chunk pointer on it.

    With these changes we're always able to reach the headers again.

    The biggest change here is that now the sctp_chunk struct and the
    original skb are only freed after the application consumed the buffer.
    Note however that the original payload was already like this due to the
    skb cloning.

    For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
    IPv6 now can fetch it directly from original's IPv6 CB as the original
    skb is still accessible.

    In the future we probably can simplify sctp_v*_skb_iif() stuff, as
    sctp_v4_skb_iif() was called but it's return value not used, and now
    it's not even called, but such cleanup is out of scope for this change.

    Fixes: 90017accff61 ("sctp: Add GSO support")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • The next patch needs 8 bytes in there. sctp_ulpevent has a hole due to
    bad alignment; msg_flags is using 4 bytes while it actually uses only 2, so
    we shrink it, and iif member (4 bytes) which can be easily fetched from
    another place once the next patch is there, so we remove it and thus
    creating space for 8 bytes.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

01 Aug, 2014

1 commit

  • The SCTP socket extensions API document describes the v4mapping option as
    follows:

    8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)

    This socket option is a Boolean flag which turns on or off the
    mapping of IPv4 addresses. If this option is turned on, then IPv4
    addresses will be mapped to V6 representation. If this option is
    turned off, then no mapping will be done of V4 addresses and a user
    will receive both PF_INET6 and PF_INET type addresses on the socket.
    See [RFC3542] for more details on mapped V6 addresses.

    This description isn't really in line with what the code does though.

    Introduce addr_to_user (renamed addr_v4map), which should be called
    before any sockaddr is passed back to user space. The new function
    places the sockaddr into the correct format depending on the
    SCTP_I_WANT_MAPPED_V4_ADDR option.

    Audit all places that touched v4mapped and either sanely construct
    a v4 or v6 address then call addr_to_user, or drop the
    unnecessary v4mapped check entirely.

    Audit all places that call addr_to_user and verify they are on a sycall
    return path.

    Add a custom getname that formats the address properly.

    Several bugs are addressed:
    - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
    addresses to user space
    - The addr_len returned from recvmsg was not correct when
    returning AF_INET on a v6 socket
    - flowlabel and scope_id were not zerod when promoting
    a v4 to v6
    - Some syscalls like bind and connect behaved differently
    depending on v4mapped

    Tested bind, getpeername, getsockname, connect, and recvmsg for proper
    behaviour in v4mapped = 1 and 0 cases.

    Signed-off-by: Neil Horman
    Tested-by: Jason Gunthorpe
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: David S. Miller

    Jason Gunthorpe
     

17 Jul, 2014

2 commits

  • This patch implements section 5.3.6. of RFC6458, that is, support
    for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
    is placed into ancillary data cmsghdr structure for each recvmsg()
    call, if this information is already available when delivering the
    current message.

    This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
    level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
    user space applications as per RFC6458, section 8.1.30.

    The sctp_nxtinfo structure is defined as per RFC as below ...

    struct sctp_nxtinfo {
    uint16_t nxt_sid;
    uint16_t nxt_flags;
    uint32_t nxt_ppid;
    uint32_t nxt_length;
    sctp_assoc_t nxt_assoc_id;
    };

    ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
    SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.

    Joint work with Daniel Borkmann.

    Signed-off-by: Geir Ola Vaagland
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Geir Ola Vaagland
     
  • This patch implements section 5.3.5. of RFC6458, that is, support
    for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is
    placed into ancillary data cmsghdr structure for each recvmsg()
    call.

    This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
    level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user
    space applications as per RFC6458, section 8.1.29.

    The sctp_rcvinfo structure is defined as per RFC as below ...

    struct sctp_rcvinfo {
    uint16_t rcv_sid;
    uint16_t rcv_ssn;
    uint16_t rcv_flags;

    uint32_t rcv_ppid;
    uint32_t rcv_tsn;
    uint32_t rcv_cumtsn;
    uint32_t rcv_context;
    sctp_assoc_t rcv_assoc_id;
    };

    ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
    SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo.
    An sctp_rcvinfo item always corresponds to the data in msg_iov.

    Joint work with Daniel Borkmann.

    Signed-off-by: Geir Ola Vaagland
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Geir Ola Vaagland
     

15 Jul, 2014

1 commit

  • While working on some other SCTP code, I noticed that some
    structures shared with user space are leaking uninitialized
    stack or heap buffer. In particular, struct sctp_sndrcvinfo
    has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
    remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
    putting this into cmsg. But also struct sctp_remote_error
    contains a 2 bytes hole that we don't fill but place into a skb
    through skb_copy_expand() via sctp_ulpevent_make_remote_error().

    Both structures are defined by the IETF in RFC6458:

    * Section 5.3.2. SCTP Header Information Structure:

    The sctp_sndrcvinfo structure is defined below:

    struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;

    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
    };

    * 6.1.3. SCTP_REMOTE_ERROR:

    A remote peer may send an Operation Error message to its peer.
    This message indicates a variety of error conditions on an
    association. The entire ERROR chunk as it appears on the wire
    is included in an SCTP_REMOTE_ERROR event. Please refer to the
    SCTP specification [RFC4960] and any extensions for a list of
    possible error formats. An SCTP error notification has the
    following format:

    struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;

    sctp_assoc_t sre_assoc_id;
    uint8_t sre_data[];
    };

    Fix this by setting both to 0 before filling them out. We also
    have other structures shared between user and kernel space in
    SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
    copy that buffer over from user space first and thus don't need
    to care about it in that cases.

    While at it, we can also remove lengthy comments copied from
    the draft, instead, we update the comment with the correct RFC
    number where one can look it up.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Apr, 2014

1 commit

  • This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management
    to reflect real state of the receiver's buffer") as it introduced a
    serious performance regression on SCTP over IPv4 and IPv6, though a not
    as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.

    Current state:

    [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
    iperf version 3.0.1 (10 January 2014)
    Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
    Time: Fri, 11 Apr 2014 17:56:21 GMT
    Connecting to host 192.168.241.3, port 5201
    Cookie: Lab200slot2.1397238981.812898.548918
    [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
    Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
    [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
    [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
    [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
    [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
    [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
    [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
    [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
    [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
    [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
    [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
    [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
    (etc)

    [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
    iperf version 3.0.1 (10 January 2014)
    Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
    Time: Fri, 11 Apr 2014 19:08:41 GMT
    Connecting to host 2001:db8:0:f101::1, port 5201
    Cookie: Lab200slot2.1397243321.714295.2b3f7c
    [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
    Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec
    [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec
    [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec
    [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec
    [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec
    [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec
    [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec
    [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec
    [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec
    [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec
    [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec
    [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec
    [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec
    [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec
    (etc)

    After patch:

    [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
    iperf version 3.0.1 (10 January 2014)
    Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
    Time: Mon, 14 Apr 2014 16:40:48 GMT
    Connecting to host 192.168.240.3, port 5201
    Cookie: Lab200slot2.1397493648.413274.65e131
    [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
    Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
    [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
    [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
    [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
    [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
    [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
    [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
    [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec

    With the reverted patch applied, the SCTP/IPv4 performance is back
    to normal on latest upstream for IPv4 and IPv6 and has same throughput
    as 3.4.2 test kernel, steady and interval reports are smooth again.

    Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
    Reported-by: Peter Butler
    Reported-by: Dongsheng Song
    Reported-by: Fengguang Wu
    Tested-by: Peter Butler
    Signed-off-by: Daniel Borkmann
    Cc: Matija Glavinic Pecotic
    Cc: Alexander Sverdlin
    Cc: Vlad Yasevich
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

17 Feb, 2014

1 commit

  • Implementation of (a)rwnd calculation might lead to severe performance issues
    and associations completely stalling. These problems are described and solution
    is proposed which improves lksctp's robustness in congestion state.

    1) Sudden drop of a_rwnd and incomplete window recovery afterwards

    Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
    but size of sk_buff, which is blamed against receiver buffer, is not accounted
    in rwnd. Theoretically, this should not be the problem as actual size of buffer
    is double the amount requested on the socket (SO_RECVBUF). Problem here is
    that this will have bad scaling for data which is less then sizeof sk_buff.
    E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
    of traffic of this size (less then 100B).

    An example of sudden drop and incomplete window recovery is given below. Node B
    exhibits problematic behavior. Node A initiates association and B is configured
    to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
    message in 4G (LTE) network). On B data is left in buffer by not reading socket
    in userspace.

    Lets examine when we will hit pressure state and declare rwnd to be 0 for
    scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
    chunk is sent in separate sctp packet)

    Logic is implemented in sctp_assoc_rwnd_decrease:

    socket_buffer (see below) is maximum size which can be held in socket buffer
    (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)

    A simple expression is given for which it will be examined after how many
    packets for above stated parameters we enter pressure state:

    We start by condition which has to be met in order to enter pressure state:

    socket_buffer < currently_alloced;

    currently_alloced is represented as size of sctp packets received so far and not
    yet delivered to userspace. x is the number of chunks/packets (since there is no
    bundling, and each chunk is delivered in separate packet, we can observe each
    chunk also as sctp packet, and what is important here, having its own sk_buff):

    socket_buffer < x*each_sctp_packet;

    each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
    twice the amount of initially requested size of socket buffer, which is in case
    of sctp, twice the a_rwnd requested:

    2*rwnd < x*(payload+sizeof(struc sk_buff));

    sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
    and each payload size is 43

    20000 < x(43+190);

    x > 20000/233;

    x ~> 84;

    After ~84 messages, pressure state is entered and 0 rwnd is advertised while
    received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
    drop from 6474 to 0, as it will be now shown in example:

    IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
    IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
    IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
    IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]

    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]

    --> Sudden drop

    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    At this point, rwnd_press stores current rwnd value so it can be later restored
    in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
    slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
    condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
    adding amount which is read from userspace. Let us observe values in above
    example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
    amount of actual sctp data currently waiting to be delivered to userspace
    is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
    only for sctp data, which is ~3500. Condition is never met, and when userspace
    reads all data, rwnd stays on 3569.

    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

    --> At this point userspace read everything, rwnd recovered only to 3569

    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

    Reproduction is straight forward, it is enough for sender to send packets of
    size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.

    2) Minute size window for associations sharing the same socket buffer

    In case multiple associations share the same socket, and same socket buffer
    (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
    of the associations can permanently drop rwnd of other association(s).

    Situation will be typically observed as one association suddenly having rwnd
    dropped to size of last packet received and never recovering beyond that point.
    Different scenarios will lead to it, but all have in common that one of the
    associations (let it be association from 1)) nearly depleted socket buffer, and
    the other association blames socket buffer just for the amount enough to start
    the pressure. This association will enter pressure state, set rwnd_press and
    announce 0 rwnd.
    When data is read by userspace, similar situation as in 1) will occur, rwnd will
    increase just for the size read by userspace but rwnd_press will be high enough
    so that association doesn't have enough credit to reach rwnd_press and restore
    to previous state. This case is special case of 1), being worse as there is, in
    the worst case, only one packet in buffer for which size rwnd will be increased.
    Consequence is association which has very low maximum rwnd ('minute size', in
    our case down to 43B - size of packet which caused pressure) and as such
    unusable.

    Scenario happened in the field and labs frequently after congestion state (link
    breaks, different probabilities of packet drop, packet reordering) and with
    scenario 1) preceding. Here is given a deterministic scenario for reproduction:

    >From node A establish two associations on the same socket, with rcvbuf_policy
    being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
    repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
    scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
    bring it down close to 0, as in just one more packet would close it. This has as
    a consequence that association number 2 is able to receive (at least) one more
    packet which will bring it in pressure state. E.g. if association 2 had rwnd of
    10000, packet received was 43, and we enter at this point into pressure,
    rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
    increase for 43, but conditions to restore rwnd to original state, just as in
    1), will never be satisfied.

    --> Association 1, between A.y and B.12345

    IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
    IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
    IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
    IP B.12345 > A.55915: sctp (1) [COOKIE ACK]

    --> Association 2, between A.z and B.12346

    IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
    IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
    IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
    IP B.12346 > A.55915: sctp (1) [COOKIE ACK]

    --> Deplete socket buffer by sending messages of size 43B over association 1

    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]

    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]

    --> Sudden drop on 1

    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
    association 1 so there is place in buffer for only one more packet

    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

    --> Socket buffer is almost depleted, but there is space for one more packet,
    send them over association 2, size 43B

    IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
    IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    --> Immediate drop

    IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    --> Read everything from the socket, both association recover up to maximum rwnd
    they are capable of reaching, note that association 1 recovered up to 3698,
    and association 2 recovered only to 43

    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
    IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
    IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

    A careful reader might wonder why it is necessary to reproduce 1) prior
    reproduction of 2). It is simply easier to observe when to send packet over
    association 2 which will push association into the pressure state.

    Proposed solution:

    Both problems share the same root cause, and that is improper scaling of socket
    buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
    calculating rwnd is not possible due to fact that there is no linear
    relationship between amount of data blamed in increase/decrease with IP packet
    in which payload arrived. Even in case such solution would be followed,
    complexity of the code would increase. Due to nature of current rwnd handling,
    slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
    entered is rationale, but it gives false representation to the sender of current
    buffer space. Furthermore, it implements additional congestion control mechanism
    which is defined on implementation, and not on standard basis.

    Proposed solution simplifies whole algorithm having on mind definition from rfc:

    o Receiver Window (rwnd): This gives the sender an indication of the space
    available in the receiver's inbound buffer.

    Core of the proposed solution is given with these lines:

    sctp_assoc_rwnd_update:
    if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
    asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
    else
    asoc->rwnd = 0;

    We advertise to sender (half of) actual space we have. Half is in the braces
    depending whether you would like to observe size of socket buffer as SO_RECVBUF
    or twice the amount, i.e. size is the one visible from userspace, that is,
    from kernelspace.
    In this way sender is given with good approximation of our buffer space,
    regardless of the buffer policy - we always advertise what we have. Proposed
    solution fixes described problems and removes necessity for rwnd restoration
    algorithm. Finally, as proposed solution is simplification, some lines of code,
    along with some bytes in struct sctp_association are saved.

    Version 2 of the patch addressed comments from Vlad. Name of the function is set
    to be more descriptive, and two parts of code are changed, in one removing the
    superfluous call to sctp_assoc_rwnd_update since call would not result in update
    of rwnd, and the other being reordering of the code in a way that call to
    sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
    in a way that existing function is not reordered/copied in line, but it is
    correctly called. Thanks Vlad for suggesting.

    Signed-off-by: Matija Glavinic Pecotic
    Reviewed-by: Alexander Sverdlin
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Matija Glavinic Pecotic
     

07 Dec, 2013

1 commit

  • Several files refer to an old address for the Free Software Foundation
    in the file header comment. Resolve by replacing the address with
    the URL so that we do not have to keep
    updating the header comments anytime the address changes.

    CC: Vlad Yasevich
    CC: Neil Horman
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jeff Kirsher
     

10 Aug, 2013

1 commit

  • With the restructuring of the lksctp.org site, we only allow bug
    reports through the SCTP mailing list linux-sctp@vger.kernel.org,
    not via SF, as SF is only used for web hosting and nothing more.
    While at it, also remove the obvious statement that bugs will be
    fixed and incooperated into the kernel.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

25 Jul, 2013

1 commit

  • The SCTP mailing list address to send patches or questions
    to is linux-sctp@vger.kernel.org and not
    lksctp-developers@lists.sourceforge.net anymore. Therefore,
    update all occurences.

    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

18 Jun, 2013

1 commit

  • SCTP_STATIC is just another define for the static keyword. It's use
    is inconsistent in the SCTP code anyway and it was introduced in the
    initial implementation of SCTP in 2.5. We have a regression suite in
    lksctp-tools, but this is for user space only, so noone makes use of
    this macro anymore. The kernel test suite for 2.5 is incompatible with
    the current SCTP code anyway.

    So simply Remove it, to be more consistent with the rest of the kernel
    code.

    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

01 Aug, 2012

1 commit

  • This patch series is based on top of "Swap-over-NBD without deadlocking
    v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.

    When a user or administrator requires swap for their application, they
    create a swap partition and file, format it with mkswap and activate it
    with swapon. In diskless systems this is not an option so if swap if
    required then swapping over the network is considered. The two likely
    scenarios are when blade servers are used as part of a cluster where the
    form factor or maintenance costs do not allow the use of disks and thin
    clients.

    The Linux Terminal Server Project recommends the use of the Network Block
    Device (NBD) for swap but this is not always an option. There is no
    guarantee that the network attached storage (NAS) device is running Linux
    or supports NBD. However, it is likely that it supports NFS so there are
    users that want support for swapping over NFS despite any performance
    concern. Some distributions currently carry patches that support swapping
    over NFS but it would be preferable to support it in the mainline kernel.

    Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

    Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
    reserves.

    Patch 3 adds three helpers for filesystems to handle swap cache pages.
    For example, page_file_mapping() returns page->mapping for
    file-backed pages and the address_space of the underlying
    swap file for swap cache pages.

    Patch 4 adds two address_space_operations to allow a filesystem
    to pin all metadata relevant to a swapfile in memory. Upon
    successful activation, the swapfile is marked SWP_FILE and
    the address space operation ->direct_IO is used for writing
    and ->readpage for reading in swap pages.

    Patch 5 notes that patch 3 is bolting
    filesystem-specific-swapfile-support onto the side and that
    the default handlers have different information to what
    is available to the filesystem. This patch refactors the
    code so that there are generic handlers for each of the new
    address_space operations.

    Patch 6 adds an API to allow a vector of kernel addresses to be
    translated to struct pages and pinned for IO.

    Patch 7 adds support for using highmem pages for swap by kmapping
    the pages before calling the direct_IO handler.

    Patch 8 updates NFS to use the helpers from patch 3 where necessary.

    Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.

    Patch 10 implements the new swapfile-related address_space operations
    for NFS and teaches the direct IO handler how to manage
    kernel addresses.

    Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
    where appropriate.

    Patch 12 fixes a NULL pointer dereference that occurs when using
    swap-over-NFS.

    With the patches applied, it is possible to mount a swapfile that is on an
    NFS filesystem. Swap performance is not great with a swap stress test
    taking roughly twice as long to complete than if the swap device was
    backed by NBD.

    This patch: netvm: prevent a stream-specific deadlock

    It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
    that we're over the global rmem limit. This will prevent SOCK_MEMALLOC
    buffers from receiving data, which will prevent userspace from running,
    which is needed to reduce the buffered data.

    Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit. Once
    this change it applied, it is important that sockets that set
    SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
    If this happens, a warning is generated and the tokens reclaimed to avoid
    accounting errors until the bug is fixed.

    [davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: David S. Miller
    Acked-by: Rik van Riel
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: Christoph Hellwig
    Cc: Mike Christie
    Cc: Eric B Munson
    Cc: Sebastian Andrzej Siewior
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

01 Jul, 2012

1 commit

  • It was noticed recently that when we send data on a transport, its possible that
    we might bundle a sack that arrived on a different transport. While this isn't
    a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
    2960:

    An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
    etc.) to the same destination transport address from which it
    received the DATA or control chunk to which it is replying. This
    rule should also be followed if the endpoint is bundling DATA chunks
    together with the reply chunk.

    This patch seeks to correct that. It restricts the bundling of sack operations
    to only those transports which have moved the ctsn of the association forward
    since the last sack. By doing this we guarantee that we only bundle outbound
    saks on a transport that has received a chunk since the last sack. This brings
    us into stricter compliance with the RFC.

    Vlad had initially suggested that we strictly allow only sack bundling on the
    transport that last moved the ctsn forward. While this makes sense, I was
    concerned that doing so prevented us from bundling in the case where we had
    received chunks that moved the ctsn on multiple transports. In those cases, the
    RFC allows us to select any of the transports having received chunks to bundle
    the sack on. so I've modified the approach to allow for that, by adding a state
    variable to each transport that tracks weather it has moved the ctsn since the
    last sack. This I think keeps our behavior (and performance), close enough to
    our current profile that I think we can do this without a sysctl knob to
    enable/disable it.

    Signed-off-by: Neil Horman
    CC: Vlad Yaseivch
    CC: David S. Miller
    CC: linux-sctp@vger.kernel.org
    Reported-by: Michele Baldessari
    Reported-by: sorin serban
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Neil Horman
     

09 Jul, 2011

1 commit

  • Trigger user ABORT if application closes a socket which has data
    queued on the socket receive queue or chunks waiting on the
    reassembly or ordering queue as this would imply data being lost
    which defeats the point of a graceful shutdown.

    This behavior is already practiced in TCP.

    We do not check the input queue because that would mean to parse
    all chunks on it to look for unacknowledged data which seems too
    much of an effort. Control chunks or duplicated chunks may also
    be in the input queue and should not be stopping a graceful
    shutdown.

    Signed-off-by: Thomas Graf
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Thomas Graf
     

21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

27 Apr, 2011

1 commit


22 Apr, 2011

2 commits

  • This patch implement event notification SCTP_SENDER_DRY_EVENT.
    SCTP Socket API Extensions:

    6.1.9. SCTP_SENDER_DRY_EVENT

    When the SCTP stack has no more user data to send or retransmit, this
    notification is given to the user. Also, at the time when a user app
    subscribes to this event, if there is no data to be sent or
    retransmit, the stack will immediately send up this notification.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • This patch change the auth event type name to SCTP_AUTHENTICATION_EVENT,
    which is based on API extension compliance.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

31 Mar, 2011

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Jun, 2009

1 commit


09 Oct, 2008

1 commit

  • The tsn map currently use is 4K large and is stuck inside
    the sctp_association structure making memory references REALLY
    expensive. What we really need is at most 4K worth of bits
    so the biggest map we would have is 512 bytes. Also, the
    map is only really usefull when we have gaps to store and
    report. As such, starting with minimal map of say 32 TSNs (bits)
    should be enough for normal low-loss operations. We can grow
    the map by some multiple of 32 along with some extra room any
    time we receive the TSN which would put us outside of the map
    boundry. As we close gaps, we can shift the map to rebase
    it on the latest TSN we've seen. This saves 4088 bytes per
    association just in the map alone along savings from the now
    unnecessary structure members.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich