19 Apr, 2014

1 commit

  • Currently, it is possible to create an SCTP socket, then switch
    auth_enable via sysctl setting to 1 and crash the system on connect:

    Oops[#1]:
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
    task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
    [...]
    Call Trace:
    [] sctp_auth_asoc_set_default_hmac+0x68/0x80
    [] sctp_process_init+0x5e0/0x8a4
    [] sctp_sf_do_5_1B_init+0x234/0x34c
    [] sctp_do_sm+0xb4/0x1e8
    [] sctp_endpoint_bh_rcv+0x1c4/0x214
    [] sctp_rcv+0x588/0x630
    [] sctp6_rcv+0x10/0x24
    [] ip6_input+0x2c0/0x440
    [] __netif_receive_skb_core+0x4a8/0x564
    [] process_backlog+0xb4/0x18c
    [] net_rx_action+0x12c/0x210
    [] __do_softirq+0x17c/0x2ac
    [] irq_exit+0x54/0xb0
    [] ret_from_irq+0x0/0x4
    [] rm7k_wait_irqoff+0x24/0x48
    [] cpu_startup_entry+0xc0/0x148
    [] start_kernel+0x37c/0x398
    Code: dd0900b8 000330f8 0126302d 50c0fff1 0047182a a48306a0
    03e00008 00000000
    ---[ end trace b530b0551467f2fd ]---
    Kernel panic - not syncing: Fatal exception in interrupt

    What happens while auth_enable=0 in that case is, that
    ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
    when endpoint is being created.

    After that point, if an admin switches over to auth_enable=1,
    the machine can crash due to NULL pointer dereference during
    reception of an INIT chunk. When we enter sctp_process_init()
    via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
    the INIT verification succeeds and while we walk and process
    all INIT params via sctp_process_param() we find that
    net->sctp.auth_enable is set, therefore do not fall through,
    but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
    dereference what we have set to NULL during endpoint
    initialization phase.

    The fix is to make auth_enable immutable by caching its value
    during endpoint initialization, so that its original value is
    being carried along until destruction. The bug seems to originate
    from the very first days.

    Fix in joint work with Daniel Borkmann.

    Reported-by: Joshua Kinard
    Signed-off-by: Vlad Yasevich
    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Tested-by: Joshua Kinard
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

16 Apr, 2014

1 commit

  • ip_queue_xmit() assumes the skb it has to transmit is attached to an
    inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership")
    changed l2tp to not change skb ownership and thus broke this assumption.

    One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
    so that we do not assume skb->sk points to the socket used by l2tp
    tunnel.

    Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
    Reported-by: Zhan Jianyu
    Tested-by: Zhan Jianyu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Apr, 2014

1 commit

  • This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management
    to reflect real state of the receiver's buffer") as it introduced a
    serious performance regression on SCTP over IPv4 and IPv6, though a not
    as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.

    Current state:

    [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
    iperf version 3.0.1 (10 January 2014)
    Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
    Time: Fri, 11 Apr 2014 17:56:21 GMT
    Connecting to host 192.168.241.3, port 5201
    Cookie: Lab200slot2.1397238981.812898.548918
    [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
    Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
    [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
    [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
    [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
    [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
    [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
    [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
    [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
    [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
    [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
    [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
    [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
    [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
    (etc)

    [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
    iperf version 3.0.1 (10 January 2014)
    Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
    Time: Fri, 11 Apr 2014 19:08:41 GMT
    Connecting to host 2001:db8:0:f101::1, port 5201
    Cookie: Lab200slot2.1397243321.714295.2b3f7c
    [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
    Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec
    [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec
    [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec
    [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec
    [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec
    [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec
    [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec
    [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec
    [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec
    [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec
    [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec
    [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec
    [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec
    [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec
    (etc)

    After patch:

    [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
    iperf version 3.0.1 (10 January 2014)
    Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
    Time: Mon, 14 Apr 2014 16:40:48 GMT
    Connecting to host 192.168.240.3, port 5201
    Cookie: Lab200slot2.1397493648.413274.65e131
    [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
    Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
    [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
    [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
    [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
    [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
    [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
    [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
    [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec

    With the reverted patch applied, the SCTP/IPv4 performance is back
    to normal on latest upstream for IPv4 and IPv6 and has same throughput
    as 3.4.2 test kernel, steady and interval reports are smooth again.

    Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
    Reported-by: Peter Butler
    Reported-by: Dongsheng Song
    Reported-by: Fengguang Wu
    Tested-by: Peter Butler
    Signed-off-by: Daniel Borkmann
    Cc: Matija Glavinic Pecotic
    Cc: Alexander Sverdlin
    Cc: Vlad Yasevich
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Apr, 2014

1 commit

  • In function sctp_wake_up_waiters(), we need to involve a test
    if the association is declared dead. If so, we don't have any
    reference to a possible sibling association anymore and need
    to invoke sctp_write_space() instead, and normally walk the
    socket's associations and notify them of new wmem space. The
    reason for special casing is that otherwise, we could run
    into the following issue when a sctp_primitive_SEND() call
    from sctp_sendmsg() fails, and tries to flush an association's
    outq, i.e. in the following way:

    sctp_association_free()
    `-> list_del(&asoc->asocs) base.dead = true
    sctp_outq_free(&asoc->outqueue)
    `-> __sctp_outq_teardown()
    `-> sctp_chunk_free()
    `-> consume_skb()
    `-> sctp_wfree()
    `-> sctp_wake_up_waiters() ep->sndbuf_policy=0

    Therefore, only walk the list in an 'optimized' way if we find
    that the current association is still active. We could also use
    list_del_init() in addition when we call sctp_association_free(),
    but as Vlad suggests, we want to trap such bugs and thus leave
    it poisoned as is.

    Why is it safe to resolve the issue by testing for asoc->base.dead?
    Parallel calls to sctp_sendmsg() are protected under socket lock,
    that is lock_sock()/release_sock(). Only within that path under
    lock held, we're setting skb/chunk owner via sctp_set_owner_w().
    Eventually, chunks are freed directly by an association still
    under that lock. So when traversing association list on destruction
    time from sctp_wake_up_waiters() via sctp_wfree(), a different
    CPU can't be running sctp_wfree() while another one calls
    sctp_association_free() as both happens under the same lock.
    Therefore, this can also not race with setting/testing against
    asoc->base.dead as we are guaranteed for this to happen in order,
    under lock. Further, Vlad says: the times we check asoc->base.dead
    is when we've cached an association pointer for later processing.
    In between cache and processing, the association may have been
    freed and is simply still around due to reference counts. We check
    asoc->base.dead under a lock, so it should always be safe to check
    and not race against sctp_association_free(). Stress-testing seems
    fine now, too.

    Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 Apr, 2014

1 commit

  • SCTP charges chunks for wmem accounting via skb->truesize in
    sctp_set_owner_w(), and sctp_wfree() respectively as the
    reverse operation. If a sender runs out of wmem, it needs to
    wait via sctp_wait_for_sndbuf(), and gets woken up by a call
    to __sctp_write_space() mostly via sctp_wfree().

    __sctp_write_space() is being called per association. Although
    we assign sk->sk_write_space() to sctp_write_space(), which
    is then being done per socket, it is only used if send space
    is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
    is set and therefore not invoked in sock_wfree().

    Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf
    again when transmitting packet") fixed an issue where in case
    sctp_packet_transmit() manages to queue up more than sndbuf
    bytes, sctp_wait_for_sndbuf() will never be woken up again
    unless it is interrupted by a signal. However, a still
    remaining issue is that if net.sctp.sndbuf_policy=0, that is
    accounting per socket, and one-to-many sockets are in use,
    the reclaimed write space from sctp_wfree() is 'unfairly'
    handed back on the server to the association that is the lucky
    one to be woken up again via __sctp_write_space(), while
    the remaining associations are never be woken up again
    (unless by a signal).

    The effect disappears with net.sctp.sndbuf_policy=1, that
    is wmem accounting per association, as it guarantees a fair
    share of wmem among associations.

    Therefore, if we have reclaimed memory in case of per socket
    accounting, wake all related associations to a socket in a
    fair manner, that is, traverse the socket association list
    starting from the current neighbour of the association and
    issue a __sctp_write_space() to everyone until we end up
    waking ourselves. This guarantees that no association is
    preferred over another and even if more associations are
    taken into the one-to-many session, all receivers will get
    messages from the server and are not stalled forever on
    high load. This setting still leaves the advantage of per
    socket accounting in touch as an association can still use
    up global limits if unused by others.

    Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.")
    Signed-off-by: Daniel Borkmann
    Cc: Thomas Graf
    Cc: Neil Horman
    Cc: Vlad Yasevich
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Mar, 2014

1 commit


14 Mar, 2014

1 commit

  • This is basically just to let Coverity et al shut up. Remove an
    unneeded NULL check in sctp_assoc_update_retran_path().

    It is safe to remove it, because in sctp_assoc_update_retran_path()
    we iterate over the list of transports, our own transport which is
    asoc->peer.retran_path included. In the iteration, we skip the
    list head element and transports in state SCTP_UNCONFIRMED.

    Such transports came from peer addresses received in INIT/INIT-ACK
    address parameters. They are not yet confirmed by a heartbeat and
    not available for data transfers.

    We know however that in the list of transports, even if it contains
    such elements, it at least contains our asoc->peer.retran_path as
    well, so even if next to that element, we only encounter
    SCTP_UNCONFIRMED transports, we are always going to fall back to
    asoc->peer.retran_path through sctp_trans_elect_best(), as that is
    for sure not SCTP_UNCONFIRMED as per fbdf501c9374 ("sctp: Do no
    select unconfirmed transports for retransmissions").

    Whenever we call sctp_trans_elect_best() it will give us a non-NULL
    element back, and therefore when we break out of the loop, we are
    guaranteed to have a non-NULL transport pointer, and can remove
    the NULL check.

    Reported-by: Dan Carpenter
    Reported-by: Dave Jones
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

06 Mar, 2014

2 commits

  • While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to
    verify if we/peer is AUTH capable"), we noticed that there's a skb
    memory leakage in the error path.

    Running the same reproducer as in ec0223ec48a9 and by unconditionally
    jumping to the error label (to simulate an error condition) in
    sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
    the unfreed chunk->auth_chunk skb clone:

    Unreferenced object 0xffff8800b8f3a000 (size 256):
    comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X.........
    backtrace:
    [] kmemleak_alloc+0x4e/0xb0
    [] kmem_cache_alloc+0xc8/0x210
    [] skb_clone+0x49/0xb0
    [] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
    [] sctp_inq_push+0x4c/0x70 [sctp]
    [] sctp_rcv+0x82e/0x9a0 [sctp]
    [] ip_local_deliver_finish+0xa8/0x210
    [] nf_reinject+0xbf/0x180
    [] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
    [] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
    [] netlink_rcv_skb+0xa9/0xc0
    [] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
    [] netlink_unicast+0x168/0x250
    [] netlink_sendmsg+0x2e1/0x3f0
    [] sock_sendmsg+0x8b/0xc0
    [] ___sys_sendmsg+0x369/0x380

    What happens is that commit bbd0d59809f9 clones the skb containing
    the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
    that an endpoint requires COOKIE-ECHO chunks to be authenticated:

    ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->

    auth_chunk, we could hit the "goto nomem_init" path from
    an error condition and thus leave the cloned skb around w/o
    freeing it.

    The fix is to centrally free such clones in sctp_chunk_destroy()
    handler that is invoked from sctp_chunk_free() after all refs have
    dropped; and also move both kfree_skb(chunk->auth_chunk) there,
    so that chunk->auth_chunk is either NULL (since sctp_chunkify()
    allocs new chunks through kmem_cache_zalloc()) or non-NULL with
    a valid skb pointer. chunk->skb and chunk->auth_chunk are the
    only skbs in the sctp_chunk structure that need to be handeled.

    While at it, we should use consume_skb() for both. It is the same
    as dev_kfree_skb() but more appropriately named as we are not
    a device but a protocol. Also, this effectively replaces the
    kfree_skb() from both invocations into consume_skb(). Functions
    are the same only that kfree_skb() assumes that the frame was
    being dropped after a failure (e.g. for tools like drop monitor),
    usage of consume_skb() seems more appropriate in function
    sctp_chunk_destroy() though.

    Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Cc: Neil Horman
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2014

1 commit

  • RFC4895 introduced AUTH chunks for SCTP; during the SCTP
    handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS
    being optional though):

    ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->



    peer
    meta data (peer_random, peer_hmacs, peer_chunks) in case
    sysctl -w net.sctp.auth_enable=1 is set. If in INIT's
    SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set,
    peer_random != NULL and peer_hmacs != NULL the peer is to be
    assumed asoc->peer.auth_capable=1, in any other case
    asoc->peer.auth_capable=0.

    Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is
    available, we set up a fake auth chunk and pass that on to
    sctp_sf_authenticate(), which at latest in
    sctp_auth_calculate_hmac() reliably dereferences a NULL pointer
    at position 0..0008 when setting up the crypto key in
    crypto_hash_setkey() by using asoc->asoc_shared_key that is
    NULL as condition key_id == asoc->active_key_id is true if
    the AUTH chunk was injected correctly from remote. This
    happens no matter what net.sctp.auth_enable sysctl says.

    The fix is to check for net->sctp.auth_enable and for
    asoc->peer.auth_capable before doing any operations like
    sctp_sf_authenticate() as no key is activated in
    sctp_auth_asoc_init_active_key() for each case.

    Now as RFC4895 section 6.3 states that if the used HMAC-ALGO
    passed from the INIT chunk was not used in the AUTH chunk, we
    SHOULD send an error; however in this case it would be better
    to just silently discard such a maliciously prepared handshake
    as we didn't even receive a parameter at all. Also, as our
    endpoint has no shared key configured, section 6.3 says that
    MUST silently discard, which we are doing from now onwards.

    Before calling sctp_sf_pdiscard(), we need not only to free
    the association, but also the chunk->auth_chunk skb, as
    commit bbd0d59809f9 created a skb clone in that case.

    I have tested this locally by using netfilter's nfqueue and
    re-injecting packets into the local stack after maliciously
    modifying the INIT chunk (removing RANDOM; HMAC-ALGO param)
    and the SCTP packet containing the COOKIE_ECHO (injecting
    AUTH chunk before COOKIE_ECHO). Fixed with this patch applied.

    Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Cc: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

22 Feb, 2014

1 commit

  • Problem statement: 1) both paths (primary path1 and alternate
    path2) are up after the association has been established i.e.,
    HB packets are normally exchanged, 2) path2 gets inactive after
    path_max_retrans * max_rto timed out (i.e. path2 is down completely),
    3) now, if a transmission times out on the only surviving/active
    path1 (any ~1sec network service impact could cause this like
    a channel bonding failover), then the retransmitted packets are
    sent over the inactive path2; this happens with partial failover
    and without it.

    Besides not being optimal in the above scenario, a small failure
    or timeout in the only existing path has the potential to cause
    long delays in the retransmission (depending on RTO_MAX) until
    the still active path is reselected. Further, when the T3-timeout
    occurs, we have active_patch == retrans_path, and even though the
    timeout occurred on the initial transmission of data, not a
    retransmit, we end up updating retransmit path.

    RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
    6.4.1. "Failover from an Inactive Destination Address" the
    following:

    Some of the transport addresses of a multi-homed SCTP endpoint
    may become inactive due to either the occurrence of certain
    error conditions (see Section 8.2) or adjustments from the
    SCTP user.

    When there is outbound data to send and the primary path
    becomes inactive (e.g., due to failures), or where the SCTP
    user explicitly requests to send data to an inactive
    destination transport address, before reporting an error to
    its ULP, the SCTP endpoint should try to send the data to an
    alternate __active__ destination transport address if one
    exists.

    When retransmitting data that timed out, if the endpoint is
    multihomed, it should consider each source-destination address
    pair in its retransmission selection policy. When retransmitting
    timed-out data, the endpoint should attempt to pick the most
    divergent source-destination pair from the original
    source-destination pair to which the packet was transmitted.

    Note: Rules for picking the most divergent source-destination
    pair are an implementation decision and are not specified
    within this document.

    So, we should first reconsider to take the current active
    retransmission transport if we cannot find an alternative
    active one. If all of that fails, we can still round robin
    through unkown, partial failover, and inactive ones in the
    hope to find something still suitable.

    Commit 4141ddc02a92 ("sctp: retran_path update bug fix") broke
    that behaviour by selecting the next inactive transport when
    no other active transport was found besides the current assoc's
    peer.retran_path. Before commit 4141ddc02a92, we would have
    traversed through the list until we reach our peer.retran_path
    again, and in case that is still in state SCTP_ACTIVE, we would
    take it and return. Only if that is not the case either, we
    take the next inactive transport.

    Besides all that, another issue is that transports in state
    SCTP_UNKNOWN could be preferred over transports in state
    SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
    SCTP_UNKNOWN in the transport list yielding a weaker transport
    state to be used in retransmission.

    This patch mostly reverts 4141ddc02a92, but also rewrites
    this function to introduce more clarity and strictness into
    the code. A strict priority of transport states is enforced
    in this patch, hence selection is active > unkown > partial
    failover > inactive.

    Fixes: 4141ddc02a92 ("sctp: retran_path update bug fix")
    Signed-off-by: Daniel Borkmann
    Cc: Gui Jianfeng
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

21 Feb, 2014

1 commit

  • In current implementation it is possible to reach PF state from unconfirmed.
    We can interpret sctp-failover-02 in a way that PF state is meant to be reached
    only from active state, in the end, this is when entering PF state makes sense.
    Here are few quotes from sctp-failover-02, but regardless of these, same
    understanding can be reached from whole section 5:

    Section 5.1, quickfailover guide:
    "The PF state is an intermediate state between Active and Failed states."

    "Each time the T3-rtx timer expires on an active or idle
    destination, the error counter of that destination address will
    be incremented. When the value in the error counter exceeds
    PFMR, the endpoint should mark the destination transport address as PF."

    There are several concrete reasons for such interpretation. For start, rfc4960
    does not take into concern quickfailover algorithm. Therefore, quickfailover
    must comply to 4960. Point where this compliance can be argued is following
    behavior:
    When PF is entered, association overall error counter is incremented for each
    missed HB. This is contradictory to rfc4960, as address, while in unconfirmed
    state, is subjected to probing, and while it is probed, it should not increment
    association overall error counter. This has as a consequence that we might end
    up in situation in which we drop association due path failure on unconfirmed
    address, in case we have wrong configuration in a way:
    Association.Max.Retrans == Path.Max.Retrans.

    Another reason is that entering PF from unconfirmed will cause a loss of address
    confirmed event when address is once (if) confirmed. This is fine from failover
    guide point of view, but it is not consistent with behavior preceding failover
    implementation and recommendation from 4960:

    5.4. Path Verification
    Whenever a path is confirmed, an indication MAY be given to the upper
    layer.

    Signed-off-by: Matija Glavinic Pecotic
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Matija Glavinic Pecotic
     

19 Feb, 2014

2 commits

  • Conflicts:
    drivers/net/bonding/bond_3ad.h
    drivers/net/bonding/bond_main.c

    Two minor conflicts in bonding, both of which were overlapping
    changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit
    emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of
    'struct sctp_getaddrs_old' which includes a struct sockaddr pointer,
    sizeof(param) check will always fail in kernel as the structure in
    64bit kernel space is 4bytes larger than for user binaries compiled
    in 32bit mode. Thus, applications making use of sctp_connectx() won't
    be able to run under such circumstances.

    Introduce a compat interface in the kernel to deal with such
    situations by using a 'struct compat_sctp_getaddrs_old' structure
    where user data is copied into it, and then sucessively transformed
    into a 'struct sctp_getaddrs_old' structure with the help of
    compat_ptr(). That fixes sctp_connectx() abi without any changes
    needed in user space, and lets the SCTP test suite pass when compiled
    in 32bit and run on 64bit kernels.

    Fixes: f9c67811ebc0 ("sctp: Fix regression introduced by new sctp_connectx api")
    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

17 Feb, 2014

1 commit

  • Implementation of (a)rwnd calculation might lead to severe performance issues
    and associations completely stalling. These problems are described and solution
    is proposed which improves lksctp's robustness in congestion state.

    1) Sudden drop of a_rwnd and incomplete window recovery afterwards

    Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
    but size of sk_buff, which is blamed against receiver buffer, is not accounted
    in rwnd. Theoretically, this should not be the problem as actual size of buffer
    is double the amount requested on the socket (SO_RECVBUF). Problem here is
    that this will have bad scaling for data which is less then sizeof sk_buff.
    E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
    of traffic of this size (less then 100B).

    An example of sudden drop and incomplete window recovery is given below. Node B
    exhibits problematic behavior. Node A initiates association and B is configured
    to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
    message in 4G (LTE) network). On B data is left in buffer by not reading socket
    in userspace.

    Lets examine when we will hit pressure state and declare rwnd to be 0 for
    scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
    chunk is sent in separate sctp packet)

    Logic is implemented in sctp_assoc_rwnd_decrease:

    socket_buffer (see below) is maximum size which can be held in socket buffer
    (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)

    A simple expression is given for which it will be examined after how many
    packets for above stated parameters we enter pressure state:

    We start by condition which has to be met in order to enter pressure state:

    socket_buffer < currently_alloced;

    currently_alloced is represented as size of sctp packets received so far and not
    yet delivered to userspace. x is the number of chunks/packets (since there is no
    bundling, and each chunk is delivered in separate packet, we can observe each
    chunk also as sctp packet, and what is important here, having its own sk_buff):

    socket_buffer < x*each_sctp_packet;

    each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
    twice the amount of initially requested size of socket buffer, which is in case
    of sctp, twice the a_rwnd requested:

    2*rwnd < x*(payload+sizeof(struc sk_buff));

    sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
    and each payload size is 43

    20000 < x(43+190);

    x > 20000/233;

    x ~> 84;

    After ~84 messages, pressure state is entered and 0 rwnd is advertised while
    received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
    drop from 6474 to 0, as it will be now shown in example:

    IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
    IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
    IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
    IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]

    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]

    --> Sudden drop

    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    At this point, rwnd_press stores current rwnd value so it can be later restored
    in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
    slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
    condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
    adding amount which is read from userspace. Let us observe values in above
    example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
    amount of actual sctp data currently waiting to be delivered to userspace
    is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
    only for sctp data, which is ~3500. Condition is never met, and when userspace
    reads all data, rwnd stays on 3569.

    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

    --> At this point userspace read everything, rwnd recovered only to 3569

    IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
    IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

    Reproduction is straight forward, it is enough for sender to send packets of
    size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.

    2) Minute size window for associations sharing the same socket buffer

    In case multiple associations share the same socket, and same socket buffer
    (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
    of the associations can permanently drop rwnd of other association(s).

    Situation will be typically observed as one association suddenly having rwnd
    dropped to size of last packet received and never recovering beyond that point.
    Different scenarios will lead to it, but all have in common that one of the
    associations (let it be association from 1)) nearly depleted socket buffer, and
    the other association blames socket buffer just for the amount enough to start
    the pressure. This association will enter pressure state, set rwnd_press and
    announce 0 rwnd.
    When data is read by userspace, similar situation as in 1) will occur, rwnd will
    increase just for the size read by userspace but rwnd_press will be high enough
    so that association doesn't have enough credit to reach rwnd_press and restore
    to previous state. This case is special case of 1), being worse as there is, in
    the worst case, only one packet in buffer for which size rwnd will be increased.
    Consequence is association which has very low maximum rwnd ('minute size', in
    our case down to 43B - size of packet which caused pressure) and as such
    unusable.

    Scenario happened in the field and labs frequently after congestion state (link
    breaks, different probabilities of packet drop, packet reordering) and with
    scenario 1) preceding. Here is given a deterministic scenario for reproduction:

    >From node A establish two associations on the same socket, with rcvbuf_policy
    being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
    repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
    scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
    bring it down close to 0, as in just one more packet would close it. This has as
    a consequence that association number 2 is able to receive (at least) one more
    packet which will bring it in pressure state. E.g. if association 2 had rwnd of
    10000, packet received was 43, and we enter at this point into pressure,
    rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
    increase for 43, but conditions to restore rwnd to original state, just as in
    1), will never be satisfied.

    --> Association 1, between A.y and B.12345

    IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
    IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
    IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
    IP B.12345 > A.55915: sctp (1) [COOKIE ACK]

    --> Association 2, between A.z and B.12346

    IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
    IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
    IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
    IP B.12346 > A.55915: sctp (1) [COOKIE ACK]

    --> Deplete socket buffer by sending messages of size 43B over association 1

    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]

    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]

    --> Sudden drop on 1

    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
    association 1 so there is place in buffer for only one more packet

    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

    --> Socket buffer is almost depleted, but there is space for one more packet,
    send them over association 2, size 43B

    IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
    IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    --> Immediate drop

    IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

    --> Read everything from the socket, both association recover up to maximum rwnd
    they are capable of reaching, note that association 1 recovered up to 3698,
    and association 2 recovered only to 43

    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
    IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
    IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
    IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
    IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

    A careful reader might wonder why it is necessary to reproduce 1) prior
    reproduction of 2). It is simply easier to observe when to send packet over
    association 2 which will push association into the pressure state.

    Proposed solution:

    Both problems share the same root cause, and that is improper scaling of socket
    buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
    calculating rwnd is not possible due to fact that there is no linear
    relationship between amount of data blamed in increase/decrease with IP packet
    in which payload arrived. Even in case such solution would be followed,
    complexity of the code would increase. Due to nature of current rwnd handling,
    slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
    entered is rationale, but it gives false representation to the sender of current
    buffer space. Furthermore, it implements additional congestion control mechanism
    which is defined on implementation, and not on standard basis.

    Proposed solution simplifies whole algorithm having on mind definition from rfc:

    o Receiver Window (rwnd): This gives the sender an indication of the space
    available in the receiver's inbound buffer.

    Core of the proposed solution is given with these lines:

    sctp_assoc_rwnd_update:
    if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
    asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
    else
    asoc->rwnd = 0;

    We advertise to sender (half of) actual space we have. Half is in the braces
    depending whether you would like to observe size of socket buffer as SO_RECVBUF
    or twice the amount, i.e. size is the one visible from userspace, that is,
    from kernelspace.
    In this way sender is given with good approximation of our buffer space,
    regardless of the buffer policy - we always advertise what we have. Proposed
    solution fixes described problems and removes necessity for rwnd restoration
    algorithm. Finally, as proposed solution is simplification, some lines of code,
    along with some bytes in struct sctp_association are saved.

    Version 2 of the patch addressed comments from Vlad. Name of the function is set
    to be more descriptive, and two parts of code are changed, in one removing the
    superfluous call to sctp_assoc_rwnd_update since call would not result in update
    of rwnd, and the other being reordering of the code in a way that call to
    sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
    in a way that existing function is not reordered/copied in line, but it is
    correctly called. Thanks Vlad for suggesting.

    Signed-off-by: Matija Glavinic Pecotic
    Reviewed-by: Alexander Sverdlin
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Matija Glavinic Pecotic
     

14 Feb, 2014

3 commits

  • One of my pet coding style peeves is the practice of
    adding extra return; at the end of function.
    Kill several instances of this in network code.

    I suppose some coccinelle wizardy could do this automatically.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • Here, when the net is init_net, we needn't to kmemdup the ctl_table
    again. So add a check for net. Also we can save some memory.

    Signed-off-by: Wang Weidong
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    wangweidong
     
  • As commit 3c68198e75111a90("sctp: Make hmac algorithm selection for
    cookie generation dynamic"), we miss the .data initialization.
    If we don't use the net_namespace, the problem that parts of the
    sysctl configuration won't be isolation and won't occur.

    In sctp_sysctl_net_register(), we register the sysctl for each
    net, in the for(), we use the 'table[i].data' as check condition, so
    when the 'i' is the index of sctp_hmac_alg, the data is NULL, then
    break. So add the .data initialization.

    Acked-by: Neil Horman
    Signed-off-by: Wang Weidong
    Signed-off-by: David S. Miller

    wangweidong
     

07 Feb, 2014

1 commit

  • commit efe4208f47f907b86f528788da711e8ab9dea44d:
    'ipv6: make lookups simpler and faster' broke initialization of local source
    address on accepted ipv6 sockets. Before the mentioned commit receive address
    was copied along with the contents of ipv6_pinfo in sctp_v6_create_accept_sk.
    Now when it is moved, it has to be copied separately.

    This also fixes lksctp's ipv6 regression in a sense that test_getname_v6, TC5 -
    'getsockname on a connected server socket' now passes.

    Signed-off-by: Matija Glavinic Pecotic
    Signed-off-by: David S. Miller

    Matija Glavinic Pecotic
     

22 Jan, 2014

5 commits


17 Jan, 2014

1 commit


16 Jan, 2014

1 commit


15 Jan, 2014

2 commits


14 Jan, 2014

2 commits

  • Signed-off-by: Stephen Hemminger
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
    to be honored by protocols which do more stringent validation on the
    ICMP's packet payload. This knob is useful for people who e.g. want to
    run an unmodified DNS server in a namespace where they need to use pmtu
    for TCP connections (as they are used for zone transfers or fallback
    for requests) but don't want to use possibly spoofed UDP pmtu information.

    Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
    if the returned packet is in the window or if the association is valid.

    Cc: Eric Dumazet
    Cc: David Miller
    Cc: John Heffner
    Suggested-by: Florian Weimer
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

07 Jan, 2014

1 commit


04 Jan, 2014

1 commit

  • Recently I updated the sctp socket option deprecation warnings to be both a bit
    more clear and ratelimited to prevent user processes from spamming the log file.
    Ben Hutchings suggested that I add the process name and pid to these warnings so
    that users can tell who is responsible for using the deprecated apis. This
    patch accomplishes that.

    Signed-off-by: Neil Horman
    CC: Vlad Yasevich
    CC: Ben Hutchings
    CC: "David S. Miller"
    CC: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Neil Horman
     

03 Jan, 2014

1 commit

  • The SCTP outqueue structure maintains a data chunks
    that are pending transmission, the list of chunks that
    are pending a retransmission and a length of data in
    flight. It also tries to keep the emtpy state so that
    it can performe shutdown sequence or notify user.

    The problem is that the empy state is inconsistently
    tracked. It is possible to completely drain the queue
    without sending anything when using PR-SCTP. In this
    case, the empty state will not be correctly state as
    report by Jamal Hadi Salim . This
    can cause an association to be perminantly stuck in the
    SHUTDOWN_PENDING state.

    Additionally, SCTP is incredibly inefficient when setting
    the empty state. Even though all the data is availaible
    in the outqueue structure, we ignore it and walk a list
    of trasnports.

    In the end, we can completely remove the extra empty
    state and figure out if the queue is empty by looking
    at 3 things: length of pending data, length of in-flight
    data, and exisiting of retransmit data. All of these
    are already in the strucutre.

    Reported-by: Jamal Hadi Salim
    Signed-off-by: Vlad Yasevich
    Acked-by: Neil Horman
    Tested-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

01 Jan, 2014

2 commits

  • skb_dst_set will use dst, if dst is NULL although is not a problem,
    then goto the 'no_route' and free nskb, so do the skb_dst_set is pointless.
    so move the skb_dst_set after dst check.
    Remove the unnecessary initialization as well.

    v2: fix the subject line because it would confuse people,
    as pointed out by Daniel.

    Signed-off-by: Wang Weidong
    Signed-off-by: David S. Miller

    wangweidong
     
  • During a recent discussion regarding some sctp socket options, it was noted that
    we have several points at which we issue log warnings that can be flooded at an
    unbounded rate by any user. Fix this by converting all the pr_warns in the
    sctp_setsockopt path to be pr_warn_ratelimited.

    Note there are several debug level messages as well. I'm leaving those alone,
    as, if you turn on pr_debug, you likely want lots of verbosity.

    Signed-off-by: Neil Horman
    CC: Vlad Yasevich
    CC: David Miller
    CC: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Neil Horman
     

27 Dec, 2013

4 commits