19 May, 2018

1 commit

  • [ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]

    Now sctp only delays the authentication for the normal cookie-echo
    chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
    for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
    authentication first based on the old asoc, which will definitely
    fail due to the different auth info in the old asoc.

    The duplicated cookie-echo chunk will create a new asoc with the
    auth info from this chunk, and the authentication should also be
    done with the new asoc's auth info for all of the collision 'A',
    'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
    will never pass the authentication and create the new connection.

    This issue exists since very beginning, and this fix is to make
    sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does
    for the normal cookie-echo chunk to delay the authentication.

    While at it, remove the unused params from sctp_sf_authenticate()
    and define sctp_auth_chunk_verify() used for all the places that
    do the delayed authentication.

    v1->v2:
    fix the typo in changelog as Marcelo noticed.

    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     

07 Aug, 2017

3 commits

  • This patch is to remove the typedef sctp_subtype_t, and
    replace with union sctp_subtype in the places where it's
    using this typedef.

    Note that it doesn't fix many indents although it should,
    as sctp_disposition_t's removal would mess them up again.
    So better to fix them when removing sctp_disposition_t in
    later patch.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to remove the typedef sctp_transport_cmd_t, and
    replace with enum sctp_transport_cmd in the places where it's
    using this typedef.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to remove the typedef sctp_scope_t, and
    replace with enum sctp_scope in the places where it's
    using this typedef.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

05 Jul, 2017

1 commit


02 Jul, 2017

1 commit

  • This patch is to remove the typedef sctp_paramhdr_t, and replace
    with struct sctp_paramhdr in the places where it's using this
    typedef.

    It is also to fix some indents and use sizeof(variable) instead
    of sizeof(type).

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

21 Jun, 2017

1 commit

  • It's a bad thing not to handle errors when updating asoc. The memory
    allocation failure in any of the functions called in sctp_assoc_update()
    would cause sctp to work unexpectedly.

    This patch is to fix it by aborting the asoc and reporting the error when
    any of these functions fails.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

11 Jun, 2017

1 commit


03 Jun, 2017

2 commits

  • Since last patch, sctp doesn't need to alloc memory for asoc->stream any
    more. sctp_stream_new and sctp_stream_init both are used to alloc memory
    for stream.in or stream.out, and their names are also confusing.

    This patch is to merge them into sctp_stream_init, and only pass stream
    and streamcnt parameters into it, instead of the whole asoc.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • As Marcelo's suggestion, stream is a fixed size member of asoc and would
    not grow with more streams. To avoid an allocation for it, this patch is
    to define it as an object instead of pointer and update the places using
    it, also create sctp_stream_update() called in sctp_assoc_update() to
    migrate the stream info from one stream to another.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

25 May, 2017

1 commit

  • Since commit 3dbcc105d556 ("sctp: alloc stream info when initializing
    asoc"), stream and stream.out info are always alloced when creating
    an asoc.

    So it's not correct to check !asoc->stream before updating stream
    info when processing dupcookie, but would be better to check asoc
    state instead.

    Fixes: 3dbcc105d556 ("sctp: alloc stream info when initializing asoc")
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Xin Long
     

05 Apr, 2017

1 commit

  • This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU
    updates to accomodate route invalidation."). As t->asoc can't be NULL
    in sctp_transport_update_pmtu, it could get sk from asoc, and no need
    to pass sk into that function.

    It is also to remove some duplicated codes from that function.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

31 Mar, 2017

1 commit

  • When sending a msg without asoc established, sctp will send INIT packet
    first and then enqueue chunks.

    Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing
    chunks needs to access stream info, like out stream state and out stream
    cnt.

    This patch is to fix it by allocing out stream info when initializing an
    asoc, allocing in stream and re-allocing out stream when processing init.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

23 Mar, 2017

1 commit


08 Feb, 2017

1 commit

  • Add new transport flag to allow sockets to confirm neighbour.
    When same struct dst_entry can be used for many different
    neighbours we can not use it for pending confirmations.
    The flag is propagated from transport to every packet.
    It is reset when cached dst is reset.

    Reported-by: YueHaibing
    Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
    Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Julian Anastasov
     

19 Jan, 2017

4 commits

  • This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set
    strreset_enable to indicate which reconf request type it supports,
    which is described in rfc6525 section 6.3.1.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to add reconf_enable field in all of asoc ep and netns
    to indicate if they support stream reset.

    When initializing, asoc reconf_enable get the default value from ep
    reconf_enable which is from netns netns reconf_enable by default.

    It is also to add reconf_capable in asoc peer part to know if peer
    supports reconf_enable, the value is set if ext params have reconf
    chunk support when processing init chunk, just as rfc6525 section
    5.1.1 demands.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to add a per transport timer based on sctp timer frame
    for stream reconf chunk retransmission. It would start after sending
    a reconf request chunk, and stop after receiving the response chunk.

    If the timer expires, besides retransmitting the reconf request chunk,
    it would also do the same thing with data RTO timer. like to increase
    the appropriate error counts, and perform threshold management, possibly
    destroying the asoc if sctp retransmission thresholds are exceeded, just
    as section 5.1.1 describes.

    This patch is also to add asoc strreset_chunk, it is used to save the
    reconf request chunk, so that it can be retransmitted, and to check if
    the response is really for this request by comparing the information
    inside with the response chunk as well.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to add asoc strreset_outseq and strreset_inseq for
    saving the reconf request sequence, initialize them when create
    assoc and process init, and also to define Incoming and Outgoing
    SSN Reset Request Parameter described in rfc6525 section 4.1 and
    4.2, As they can be in one same chunk as section rfc6525 3.1-3
    describes, it makes them in one function.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

07 Jan, 2017

1 commit

  • sctp stream reconf, described in RFC 6525, needs a structure to
    save per stream information in assoc, like stream state.

    In the future, sctp stream scheduler also needs it to save some
    stream scheduler params and queues.

    This patchset is to prepare the stream array in assoc for stream
    reconf. It defines sctp_stream that includes stream arrays inside
    to replace ssnmap.

    Note that we use different structures for IN and OUT streams, as
    the members in per OUT stream will get more and more different
    from per IN stream.

    v1->v2:
    - put these patches into a smaller group.
    v2->v3:
    - define sctp_stream to contain stream arrays, and create stream.c
    to put stream-related functions.
    - merge 3 patches into 1, as new sctp_stream has the same name
    with before.

    Signed-off-by: Xin Long
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

24 Dec, 2016

2 commits

  • Currently if SCTP closes the receive window with window pressure, mostly
    caused by excessive skb overhead on payload/overheads ratio, SCTP will
    close the window abruptly while saving the delta on rwnd_press. It will
    start recovering rwnd as the chunks are consumed by the application and
    the rwnd_press will be only recovered after rwnd reach the same value as
    of rwnd_press, mostly to prevent silly window syndrome.

    Thing is, this is very inefficient with small data chunks, as with those
    it will never reach back that value, and thus it will never recover from
    such pressure. This means that we will not issue window updates when
    recovering from 0 window and will rely on a sender retransmit to notice
    it.

    The fix here is to remove such threshold, as no value is good enough: it
    depends on the (avg) chunk sizes being used.

    Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
    sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
    Rate limited to 845kbps, for visibility. Capture done at netserver side.

    Previously:
    01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
    01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
    01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
    01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
    01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
    01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
    01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
    02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
    (window update missed)
    04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
    04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
    04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
    04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
    04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g

    After:
    02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
    02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
    02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
    02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
    02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
    02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
    02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
    03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
    03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
    03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
    03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
    03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
    03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
    03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
    03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
    (following data transmission triggered by window updates above)
    03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
    03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
    03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
    03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
    03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
    03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
    03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • It's possible that we receive a packet that is larger than current
    window. If it's the first packet in this way, it will cause it to
    increase rwnd_over. Then, if we receive another data chunk (specially as
    SCTP allows you to have one data chunk in flight even during 0 window),
    rwnd_over will be overwritten instead of added to.

    In the long run, this could cause the window to grow bigger than its
    initial size, as rwnd_over would be charged only for the last received
    data chunk while the code will try open the window for all packets that
    were received and had its value in rwnd_over overwritten. This, then,
    can lead to the worsening of payload/buffer ratio and cause rwnd_press
    to kick in more often.

    The fix is to sum it too, same as is done for rwnd_press, so that if we
    receive 3 chunks after closing the window, we still have to release that
    same amount before re-opening it.

    Log snippet from sctp_test exhibiting the issue:
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease:
    association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease:
    association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease:
    association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
    [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
    rwnd decreased by 1 to (0, 1, 114221)

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

17 Nov, 2016

1 commit

  • Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
    to hash a node to one chain. If in one host thousands of assocs connect
    to one server with the same lport and different laddrs (although it's
    not a normal case), all the transports would be hashed into the same
    chain.

    It may cause to keep returning -EBUSY when inserting a new node, as the
    chain is too long and sctp inserts a transport node in a loop, which
    could even lead to system hangs there.

    The new rhlist interface works for this case that there are many nodes
    with the same key in one chain. It puts them into a list then makes this
    list be as a node of the chain.

    This patch is to replace rhashtable_ interface with rhltable_ interface.
    Since a chain would not be too long and it would not return -EBUSY with
    this fix when inserting a node, the reinsert loop is also removed here.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

22 Sep, 2016

1 commit

  • To something more meaningful these days, specially because this is
    working on packet headers or lengths and which are not tied to any CPU
    arch but to the protocol itself.

    So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.

    Reported-by: David Laight
    Reported-by: David Miller
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

12 Jul, 2016

1 commit

  • According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
    We will add prsctp_enable to both asoc and ep, and replace the places
    where it used net.sctp->prsctp_enable with asoc->prsctp_enable.

    ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
    asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
    also modify it's value through sockopt SCTP_PR_SUPPORTED.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

21 Mar, 2016

1 commit

  • SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
    MTU can sometimes return values that are not aligned, like for loopback,
    which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
    will cause the last non-aligned bytes to never be used and can cause
    issues with congestion control.

    So it's better to just consider a lower MTU and keep congestion control
    calcs saner as they are based on PMTU.

    Same applies to icmp frag needed messages, which is also fixed by this
    patch.

    One other effect of this is the inability to send MTU-sized packet
    without queueing or fragmentation and without hitting Nagle. As the
    check performed at sctp_packet_can_append_data():

    if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
    /* Enough data queued to fill a packet */
    return SCTP_XMIT_OK;

    with the above example of MTU, if there are no other messages queued,
    one cannot send a packet that just fits one packet (65532 bytes) and
    without causing DATA chunk fragmentation or a delay.

    v2:
    - Added WORD_TRUNC macro

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

14 Mar, 2016

2 commits

  • Currently sctp_sendmsg() triggers some calls that will allocate memory
    with GFP_ATOMIC even when not necessary. In the case of
    sctp_packet_transmit it will allocate a linear skb that will be used to
    construct the packet and this may cause sends to fail due to ENOMEM more
    often than anticipated specially with big MTUs.

    This patch thus allows it to inherit gfp flags from upper calls so that
    it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
    similar. All others, like retransmits or flushes started from BH, are
    still allocated using GFP_ATOMIC.

    In netperf tests this didn't result in any performance drawbacks when
    memory is not too fragmented and made it trigger ENOMEM way less often.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     
  • prior to this patch, at the beginning if we have two paths in one assoc,
    they may have the same params other than the last_time_heard, it will try
    the paths like this:

    1st cycle
    try trans1 fail.
    then trans2 is selected.(cause it's last_time_heard is after trans1).

    2nd cycle:
    try trans2 fail
    then trans2 is selected.(cause it's last_time_heard is after trans1).

    3rd cycle:
    try trans2 fail
    then trans2 is selected.(cause it's last_time_heard is after trans1).

    ....

    trans1 will never have change to be selected, which is not what we expect.
    we should keeping round robin all the paths if they are just added at the
    beginning.

    So at first every tranport's last_time_heard should be initialized 0, so
    that we ensure they have the same value at the beginning, only by this,
    all the transports could get equal chance to be selected.

    Then for sctp_trans_elect_best, it should return the trans_next one when
    *trans == *trans_next, so that we can try next if it fails, but now it
    always return trans. so we can fix it by exchanging these two params when
    we calls sctp_trans_elect_tie().

    Fixes: 4c47af4d5eb2 ('net: sctp: rework multihoming retransmission path selection to rfc4960')
    Signed-off-by: Xin Long
    Acked-by: Daniel Borkmann
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

06 Jan, 2016

1 commit

  • apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
    and __sctp_lookup_association, it's invoked in the protection of sock
    lock, it will be safe, but sctp_lookup_association need to call
    rcu_read_lock() and to detect the t->dead to protect it.

    Signed-off-by: Xin Long
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

29 Sep, 2015

1 commit

  • Seemingly innocuous sctp_trans_state_to_prio_map[] array
    is way bigger than it looks, since
    "[SCTP_UNKNOWN] = 2" expands into "[0xffff] = 2" !

    This patch replaces it with switch() statement.

    Signed-off-by: Denys Vlasenko
    CC: Vlad Yasevich
    CC: Neil Horman
    CC: Marcelo Ricardo Leitner
    CC: linux-sctp@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-kernel@vger.kernel.org
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

03 Feb, 2015

1 commit


27 Jan, 2015

1 commit

  • When hitting an INIT collision case during the 4WHS with AUTH enabled, as
    already described in detail in commit 1be9a950c646 ("net: sctp: inherit
    auth_capable on INIT collisions"), it can happen that we occasionally
    still remotely trigger the following panic on server side which seems to
    have been uncovered after the fix from commit 1be9a950c646 ...

    [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
    [ 533.913657] IP: [] __kmalloc+0x95/0x230
    [ 533.940559] PGD 5030f2067 PUD 0
    [ 533.957104] Oops: 0000 [#1] SMP
    [ 533.974283] Modules linked in: sctp mlx4_en [...]
    [ 534.939704] Call Trace:
    [ 534.951833] [] ? crypto_init_shash_ops+0x60/0xf0
    [ 534.984213] [] crypto_init_shash_ops+0x60/0xf0
    [ 535.015025] [] __crypto_alloc_tfm+0x6d/0x170
    [ 535.045661] [] crypto_alloc_base+0x4c/0xb0
    [ 535.074593] [] ? _raw_spin_lock_bh+0x12/0x50
    [ 535.105239] [] sctp_inet_listen+0x161/0x1e0 [sctp]
    [ 535.138606] [] SyS_listen+0x9d/0xb0
    [ 535.166848] [] system_call_fastpath+0x16/0x1b

    ... or depending on the the application, for example this one:

    [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
    [ 1370.026506] IP: [] kmem_cache_alloc+0x75/0x1d0
    [ 1370.054568] PGD 633c94067 PUD 0
    [ 1370.070446] Oops: 0000 [#1] SMP
    [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
    [ 1370.963431] Call Trace:
    [ 1370.974632] [] ? SyS_epoll_ctl+0x53f/0x960
    [ 1371.000863] [] SyS_epoll_ctl+0x53f/0x960
    [ 1371.027154] [] ? anon_inode_getfile+0xd3/0x170
    [ 1371.054679] [] ? __alloc_fd+0xa7/0x130
    [ 1371.080183] [] system_call_fastpath+0x16/0x1b

    With slab debugging enabled, we can see that the poison has been overwritten:

    [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten
    [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
    [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
    [ 669.826424] __slab_alloc+0x4bf/0x566
    [ 669.826433] __kmalloc+0x280/0x310
    [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp]
    [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
    [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
    [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...]
    [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
    [ 669.826635] __slab_free+0x39/0x2a8
    [ 669.826643] kfree+0x1d6/0x230
    [ 669.826650] kzfree+0x31/0x40
    [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp]
    [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp]
    [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp]

    Since this only triggers in some collision-cases with AUTH, the problem at
    heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
    when having refcnt 1, once directly in sctp_assoc_update() and yet again
    from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
    the already kzfree'd memory, which is also consistent with the observation
    of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
    at a later point in time when poison is checked on new allocation).

    Reference counting of auth keys revisited:

    Shared keys for AUTH chunks are being stored in endpoints and associations
    in endpoint_shared_keys list. On endpoint creation, a null key is being
    added; on association creation, all endpoint shared keys are being cached
    and thus cloned over to the association. struct sctp_shared_key only holds
    a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
    keeps track of users internally through refcounting. Naturally, on assoc
    or enpoint destruction, sctp_shared_key are being destroyed directly and
    the reference on sctp_auth_bytes dropped.

    User space can add keys to either list via setsockopt(2) through struct
    sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
    adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
    with refcount 1 and in case of replacement drops the reference on the old
    sctp_auth_bytes. A key can be set active from user space through setsockopt()
    on the id via sctp_auth_set_active_key(), which iterates through either
    endpoint_shared_keys and in case of an assoc, invokes (one of various places)
    sctp_auth_asoc_init_active_key().

    sctp_auth_asoc_init_active_key() computes the actual secret from local's
    and peer's random, hmac and shared key parameters and returns a new key
    directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
    the reference if there was a previous one. The secret, which where we
    eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
    intitial refcount of 1, which also stays unchanged eventually in
    sctp_assoc_update(). This key is later being used for crypto layer to
    set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

    To close the loop: asoc->asoc_shared_key is freshly allocated secret
    material and independant of the sctp_shared_key management keeping track
    of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76
    ("net: sctp: fix memory leak in auth key management") is independant of
    this bug here since it concerns a different layer (though same structures
    being used eventually). asoc->asoc_shared_key is reference dropped correctly
    on assoc destruction in sctp_association_free() and when active keys are
    being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
    of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
    to remove that sctp_auth_key_put() from there which fixes these panics.

    Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
    Signed-off-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Oct, 2014

1 commit

  • When receiving a e.g. semi-good formed connection scan in the
    form of ...

    -------------- INIT[ASCONF; ASCONF_ACK] ------------->

    ... where ASCONF_a equals ASCONF_b chunk (at least both serials
    need to be equal), we panic an SCTP server!

    The problem is that good-formed ASCONF chunks that we reply with
    ASCONF_ACK chunks are cached per serial. Thus, when we receive a
    same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
    not need to process them again on the server side (that was the
    idea, also proposed in the RFC). Instead, we know it was cached
    and we just resend the cached chunk instead. So far, so good.

    Where things get nasty is in SCTP's side effect interpreter, that
    is, sctp_cmd_interpreter():

    While incoming ASCONF_a (chunk = event_arg) is being marked
    !end_of_packet and !singleton, and we have an association context,
    we do not flush the outqueue the first time after processing the
    ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
    queued up, although we set local_cork to 1. Commit 2e3216cd54b1
    changed the precedence, so that as long as we get bundled, incoming
    chunks we try possible bundling on outgoing queue as well. Before
    this commit, we would just flush the output queue.

    Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
    continue to process the same ASCONF_b chunk from the packet. As
    we have cached the previous ASCONF_ACK, we find it, grab it and
    do another SCTP_CMD_REPLY command on it. So, effectively, we rip
    the chunk->list pointers and requeue the same ASCONF_ACK chunk
    another time. Since we process ASCONF_b, it's correctly marked
    with end_of_packet and we enforce an uncork, and thus flush, thus
    crashing the kernel.

    Fix it by testing if the ASCONF_ACK is currently pending and if
    that is the case, do not requeue it. When flushing the output
    queue we may relink the chunk for preparing an outgoing packet,
    but eventually unlink it when it's copied into the skb right
    before transmission.

    Joint work with Vlad Yasevich.

    Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

23 Aug, 2014

2 commits


22 Aug, 2014

1 commit

  • Since the transport has always been in state SCTP_UNCONFIRMED, it
    therefore wasn't active before and hasn't been used before, and it
    always has been, so it is unnecessary to bug the user with a
    notification.

    Reported-by: Deepak Khandelwal
    Suggested-by: Vlad Yasevich
    Suggested-by: Michael Tuexen
    Suggested-by: Daniel Borkmann
    Signed-off-by: Zhu Yanjun
    Acked-by: Vlad Yasevich
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    zhuyj
     

23 Jul, 2014

1 commit

  • Jason reported an oops caused by SCTP on his ARM machine with
    SCTP authentication enabled:

    Internal error: Oops: 17 [#1] ARM
    CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
    task: c6eefa40 ti: c6f52000 task.ti: c6f52000
    PC is at sctp_auth_calculate_hmac+0xc4/0x10c
    LR is at sg_init_table+0x20/0x38
    pc : [] lr : [] psr: 40000013
    sp : c6f538e8 ip : 00000000 fp : c6f53924
    r10: c6f50d80 r9 : 00000000 r8 : 00010000
    r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254
    r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660
    Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
    Control: 0005397f Table: 06f28000 DAC: 00000015
    Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
    Stack: (0xc6f538e8 to 0xc6f54000)
    [...]
    Backtrace:
    [] (sctp_auth_calculate_hmac+0x0/0x10c) from [] (sctp_packet_transmit+0x33c/0x5c8)
    [] (sctp_packet_transmit+0x0/0x5c8) from [] (sctp_outq_flush+0x7fc/0x844)
    [] (sctp_outq_flush+0x0/0x844) from [] (sctp_outq_uncork+0x24/0x28)
    [] (sctp_outq_uncork+0x0/0x28) from [] (sctp_side_effects+0x1134/0x1220)
    [] (sctp_side_effects+0x0/0x1220) from [] (sctp_do_sm+0xac/0xd4)
    [] (sctp_do_sm+0x0/0xd4) from [] (sctp_assoc_bh_rcv+0x118/0x160)
    [] (sctp_assoc_bh_rcv+0x0/0x160) from [] (sctp_inq_push+0x6c/0x74)
    [] (sctp_inq_push+0x0/0x74) from [] (sctp_rcv+0x7d8/0x888)

    While we already had various kind of bugs in that area
    ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
    we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache
    auth_enable per endpoint"), this one is a bit of a different
    kind.

    Giving a bit more background on why SCTP authentication is
    needed can be found in RFC4895:

    SCTP uses 32-bit verification tags to protect itself against
    blind attackers. These values are not changed during the
    lifetime of an SCTP association.

    Looking at new SCTP extensions, there is the need to have a
    method of proving that an SCTP chunk(s) was really sent by
    the original peer that started the association and not by a
    malicious attacker.

    To cause this bug, we're triggering an INIT collision between
    peers; normal SCTP handshake where both sides intent to
    authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
    parameters that are being negotiated among peers:

    ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->



    ...

    Since such collisions can also happen with verification tags,
    the RFC4895 for AUTH rather vaguely says under section 6.1:

    In case of INIT collision, the rules governing the handling
    of this Random Number follow the same pattern as those for
    the Verification Tag, as explained in Section 5.2.4 of
    RFC 2960 [5]. Therefore, each endpoint knows its own Random
    Number and the peer's Random Number after the association
    has been established.

    In RFC2960, section 5.2.4, we're eventually hitting Action B:

    B) In this case, both sides may be attempting to start an
    association at about the same time but the peer endpoint
    started its INIT after responding to the local endpoint's
    INIT. Thus it may have picked a new Verification Tag not
    being aware of the previous Tag it had sent this endpoint.
    The endpoint should stay in or enter the ESTABLISHED
    state but it MUST update its peer's Verification Tag from
    the State Cookie, stop any init or cookie timers that may
    running and send a COOKIE ACK.

    In other words, the handling of the Random parameter is the
    same as behavior for the Verification Tag as described in
    Action B of section 5.2.4.

    Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
    case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
    side effect interpreter, and in fact it properly copies over
    peer_{random, hmacs, chunks} parameters from the newly created
    association to update the existing one.

    Also, the old asoc_shared_key is being released and based on
    the new params, sctp_auth_asoc_init_active_key() updated.
    However, the issue observed in this case is that the previous
    asoc->peer.auth_capable was 0, and has *not* been updated, so
    that instead of creating a new secret, we're doing an early
    return from the function sctp_auth_asoc_init_active_key()
    leaving asoc->asoc_shared_key as NULL. However, we now have to
    authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).

    That in fact causes the server side when responding with ...

    active_key_id is still inherited from the
    endpoint, and the same as encoded into the chunk, it uses
    asoc->asoc_shared_key, which is still NULL, as an asoc_key
    and dereferences it in ...

    crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)

    ... causing an oops. All this happens because sctp_make_cookie_ack()
    called with the *new* association has the peer.auth_capable=1
    and therefore marks the chunk with auth=1 after checking
    sctp_auth_send_cid(), but it is *actually* sent later on over
    the then *updated* association's transport that didn't initialize
    its shared key due to peer.auth_capable=0. Since control chunks
    in that case are not sent by the temporary association which
    are scheduled for deletion, they are issued for xmit via
    SCTP_CMD_REPLY in the interpreter with the context of the
    *updated* association. peer.auth_capable was 0 in the updated
    association (which went from COOKIE_WAIT into ESTABLISHED state),
    since all previous processing that performed sctp_process_init()
    was being done on temporary associations, that we eventually
    throw away each time.

    The correct fix is to update to the new peer.auth_capable
    value as well in the collision case via sctp_assoc_update(),
    so that in case the collision migrated from 0 -> 1,
    sctp_auth_asoc_init_active_key() can properly recalculate
    the secret. This therefore fixes the observed server panic.

    Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
    Reported-by: Jason Gunthorpe
    Signed-off-by: Daniel Borkmann
    Tested-by: Jason Gunthorpe
    Cc: Vlad Yasevich
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

13 Jun, 2014

1 commit

  • Consider the scenario:
    For a TCP-style socket, while processing the COOKIE_ECHO chunk in
    sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
    a new association would be created in sctp_unpack_cookie(), but afterwards,
    some processing maybe failed, and sctp_association_free() will be called to
    free the previously allocated association, in sctp_association_free(),
    sk_ack_backlog value is decremented for this socket, since the initial
    value for sk_ack_backlog is 0, after the decrement, it will be 65535,
    a wrap-around problem happens, and if we want to establish new associations
    afterward in the same socket, ABORT would be triggered since sctp deem the
    accept queue as full.
    Fix this issue by only decrementing sk_ack_backlog for associations in
    the endpoint's list.

    Fix-suggested-by: Neil Horman
    Signed-off-by: Xufeng Zhang
    Acked-by: Daniel Borkmann
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Xufeng Zhang
     

12 Jun, 2014

1 commit

  • This fixes the following sparse warning:

    net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
    net/sctp/associola.c:1556:29: expected bool [unsigned] [usertype] preload
    net/sctp/associola.c:1556:29: got restricted gfp_t

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann